>> Karin Strauss: Good afternoon. My name is... introduce Professor Kevin Fu. Kevin almost needs no introduction. ...

advertisement
>> Karin Strauss: Good afternoon. My name is Karin Strauss and it is my pleasure to
introduce Professor Kevin Fu. Kevin almost needs no introduction. He is a regular here
at MSR, but he is an associate professor at UMass Amherst. He is an expert in security
and privacy and embedded devices. He has won many awards including the TR 35 and
the Sloan Fellowship and with no more ado, Kevin.
>> Kevin Fu: Thank you Karin. And it is all of course because of having great students
so I have Negin here today and Jeremy also has co-authored papers with me in the past.
You can't have good research without great students.
What is this talk about? I normally work in security and privacy, but about six years ago
we ran into a problem whereby we were trying to get cryptography to run well on an
RFID scale device. We actually have one here today. So my model is a programmable
RFID tag, no batteries, takes all of its energy from radio waves essentially and performs
computation sensing and then uploads the data. But we could not get the crypto to work
well and the bottleneck was we could not store anything because the flash memory would
work at low voltage. So this kind of sent us down a path of okay, let's think more about
the platform and rethink abstractions because if we can't get basic things like storage to
work then there is no hope for the more interesting higher layer to me issues of things
like cryptography and just applications in general.
So this talk is going to be covering sort of two different domains of non-volatile storage
on these embedded devices, typically extremely low-power. We are not talking ARM
processor; to me that's like a power plant. We're talking much lower power than ARM
for this talk. This is based on two papers. First we will talk about Mementos which was
published at ASPLOS earlier this year, and it is essentially a set of compiler
optimizations to make programs run well on a platform like that where the platform
might reboot say every 100 milliseconds because of energy loss, because it just can't get
enough energy from harvesting. And then the second half of the talk will be about some
research that Negin had led and it's about how to write the flash memory at voltages that
the manufacturer says don't. And we will show how we can do that well and save quite a
large amount of power.
Feel free to interrupt with any questions. We can always go down a different pathway if
you find a certain topic more interesting than another. Just a little bit of motivation and
kind of to bring us back to some fundamentals, in the 1980’s Mark Weiser was talking
about ubiquitous computing. This was a term he liked to use. And he would talk about
that the most powerful things are those that are effectively invisible when you're using
them. You could almost argue that phones are getting close to being invisible, because
sometimes you forget that it is in your pocket. I am sure we are going to hear something
ring today. So that is ubiquitous. But when a batteryless invisible computer comes to be,
then you run into a fairly simple problem, a simple problem with profound implications
and that is that the thing reboots a lot. When you take away batteries, you no longer have
that safety buffer of energy, and so Mementos is our way of addressing how to get
programs to run well when you reboot a lot.
So we have sort of a three-step approach to getting things done and that is first we took
this emerging platform which Jeremy has there. Again it is a batteryless programmable
effectively an RFID tag, and we are going to try to make it more robust. So to make this
programmable RFID tag more robust, we first worked on getting a compiler to do good
optimization and also doing trace driven simulation, because if you just go and try to
build this on hardware, you might find that you just built a wonderful piece of software
that has no use. So the nice thing about simulator is you can very quickly remove things
that would not have worked on hardware, and get a pretty good sense of how it would
perform on hardware.
So here's a picture of one of the earlier devices. This is the tag that we are starting to
actually manufacture. It is based on the Intel WISP. We call this the UMass Moo only
because we think it looks like a Longhorn steer, but it is a programmable RFID tag, no
battery, and in fact all of its energy is stored right here in this 10 µF capacitor. And I am
thinking they might want me to use this laser pointer instead of walking. This little 10 µF
capacitor is the only buffer we have for energy. The whole microcontroller runs off that
thing. And it gets filled up with this antenna; this dipole antenna does all of the energy
harvesting from RF and it also does the digital communications, the wireless
communication back to whoever is interrogating it.
To give you a perspective of just how much energy that is, it is about five or six orders of
magnitude less energy than you could store in a AA battery. So imagine trying to
compute with five or six orders of magnitude less energy than a AA. That is what we are
doing. Here are some of the problems you are going to run into immediately when you
try to compute on these extremely small-scale devices. First, here is an actual trace. We
removed the units to focus in on one thing. What you think of when you see this line
going up and down a lot? What word might come to mind statistically? Does it look
constant to you [laughter]? No. This is highly variable, so this is a voltage trace and the
one thing to take away is we have a highly unpredictable, highly variable voltage supply,
the exact wrong thing you would ever want to give a microprocessor or any digital
system.
So we're going to have to cope with that. So the coping mechanism is a small capacitor
to help us smooth things out. The problem is with a small capacitor when it drains, think
of it as your water tower draining, you don't have water anymore. So in this case instead
of just not getting clean, we don't compute. And when we don't compute, well, we have
reboots and it is typically on the order of 100 to a few hundred milliseconds before we
have a reboot. So the lead graduate student on this Ben Ransford, I asked him to plot the
reboots and he was, I guess he was creative that the, so he decided to plot--this is an
actual plot of how often the device in Jerry's hand there was rebooting. And this was
when it was directly next to a 1 Watt transmitter RFID reader, that is it is sort of getting
the most power it could possibly get and yet it was still rebooting. It is hard to compute
like that.
And so we liken this to the idea of having a small child. So imagine you're trying to
write your paper and you have a small child who's trying to get your attention and just
starts throwing this beach ball [laughter] around your head. And every time the child
interrupts your thoughts you forget what you are doing and you have to start over writing
your paper. You're never going to get done with your paper, right? And all of the
authors by the way have toddlers [laughter]. That's what restraints are for, no, um.
Parallelism, I don't know. We eventually got done. But it can be hard to get done when
you have constant interruptions, so we call this a Sisyphean task in the great myths of the
classic era poor Sisyphus had to roll a rock up the hill, and every time he got near the top,
he dropped it and he had to start over. So you can imagine your computation, you're
rolling it up the hill and every time you get almost done you drop it, and you have to start
all over because you lost your energy. So we are trying to avoid that. We are trying to
figure out how to get these long-running tasks to complete in short bursts. So before we
get into our approach, I will just give you an idea of how it is done today. This is not a
new problem, but is a different way of thinking of the problem.
So typically the industry will just constrain the problem so you don't have to worry about
reboots. So this is a contactless smart card, and has anybody been on the London system
yet? So this is the London Oyster card they use for payments and there is a pretty
standard, 300 milliseconds is the golden rule. Anything you do, whether it be crypto or
communication, it needs to finish in 300 milliseconds, otherwise there is sort of the
exponential increase in passenger frustration. As they are swiping the card, they get
angry at it and they don't understand why it is not working, so they always finish in 300
milliseconds. And if your computation takes more than 300 milliseconds, they say well,
you shouldn't be doing that in the first place. So we think that is a silly idea for the
general computation so Mementos is going to relax this constraint so we can make
general-purpose computation feasible. Of course it may take many, many reboots.
So in the typical world of energy, you are going to be given a relatively constant voltage
which is, if you integrate that, that is infinite energy over time. But in the real world of,
our world, these are real traces over 40 seconds each. Your voltage is going to vary.
And it is going to vary quite a bit. These three different traces were of a student holding
this tag harvesting energy and doing the exact same thing each of the three times, but you
get a radically different voltage trace. So we have to be able to cope with this variability.
So the approach to dealing with a variable voltage supply and all of these constant
reboots is fairly straightforward. It is checkpointing and I think many of you are
probably aware of the checkpointing work. There is tons of research on checkpointing
especially in the last few decades for things like grid computing, which I guess is now
called cloud computing. Here is a trace of our voltage as we are about to die. So 1.8
volts up here is where our particular microcontroller will start to fail and so that is the
death point right there. And this is actually part of an exponential decay, but we are kind
of zoomed in so it looks linear. But imagine that your computation is rolling along and
there is some region as you're getting close to death where you think well, that is a good
place to checkpoint what you are doing. So that is what we do. As the computation
comes along, we checkpoint it before we die, and at that point we sort of go into a
hibernation state as we are harvesting energy. When we get enough headroom and we
get enough energy we can start over again, and do they restoration and then continue our
computation. So that is the easy part. Question.
>>: Just a quick question before you [inaudible] why don't they just increase the
capacitor…?
>> Kevin Fu: Good question. We tried increasing, actually, Jeremy increased the
capacitor with a super capacitor. You can increase the capacity. The problem is the
voltage on a capacitor is proportional to how much energy is inside of it. And basically if
you put say a super cap on here, it might take it 24 hours to charge before it responds. So
there is a trade-off between responsiveness, and for us we want one second or less
response time, versus this sort of insurance policy of how to cope with energy outages.
>>: So you need multiple capacitors instead of [inaudible]?
>> Kevin Fu: You could have a bank of capacitors…
>>: But it increases the cost.
>> Kevin Fu: It increases the cost and complexity. So for most of our work we focus on
sort of the fundamental case of a single small capacitor. And there are a lot of fun tricks
that you can do to use super caps, but it always involves something like hybrid energy
harvesting, adding solar panels and other energy sources is so far been we have found.
Yes?
>>: I have a question. To do the checkpointing, you need to know at what point in the
future you need to do the checkpointing. It has to be predictable, right? But what if
something happens out of the blue and you haven't accounted for in then you lose data?
>> Kevin Fu: So this is precisely why, that is the hard part of the problem that we are
going to try to address in the next few slides. The easy part is knowing that we have to
do checkpoints. The hard point is knowing when to checkpoint and what to checkpoint.
We are going to use a running example of a CRC check as our application. It might
sound a little silly but believe it or not CRCs are still pretty hard to do on these platforms
just because they are so low-power. So it takes just slightly more than half a second to
run this code and it takes about 575,000 CPU cycles. The interesting thing to note is that
we reboot about every 100 milliseconds, so we are never going to get done with this very
simple CRC without at least checkpointing. And then the next question is, what are the
best checkpoint strategies so we can get pretty close to some kind of minimal running
time. So that is what was driving us to get the programs to run well.
So let me give you an overview, a programmer’s overview of how you would use our
system Mementos to make your code run. You can write your code in C, actually you
can write in any language that LLVM supports. So we use the LLVM compiler instead
of GCC; it's a little more modular, easier to modify. So you write your code and you will
need to choose certain parameters that would, and these parameters will help you
determine the efficacy of your checkpoint strategy. Not using the parameters is what's
done by Mementos. So what Mementos does is it instruments your code, it's intermediate
representation during the compilation stage. It instruments it with energy checks. It
becomes energy aware. It has a notion of what is a capacitor and how it behaves in
different conditions. And then we built a simulator, but it is not just any old simulator; it
is a trace driven simulator. It actually takes excess input voltage measurements from an
oscilloscope so we can see how the program would behave in a realistic scenario. It
makes it reproducible so you can actually learn from it. The simulator helps us pick these
parameters so that we can dump it onto hardware and see it actually run.
So that is the basic idea of what Mementos is doing inside the compiler and inside the
simulator. The next piece is understanding the heuristics we used to actually instrument
code. So I said there were two things that we have to solve. We need to figure out when
to checkpoint what to checkpoint. So this is going to answer the when to checkpoint
question. So here is your code. What is a good place to checkpoint? You might choose
to checkpoint at the end of a loop latch, seems to meet some intuitive sense because you
are probably, you've just finished some important computation, so there's probably
something to saving your registers. So that seems like a logical place to do it, or maybe
after a function call return, because in traditional stack oriented machines, if you do a
function call you push all of your arguments on the stack. You do your work. You come
back and you pop everything off. So we would expect to have a local minima in our
stack and therefore much less state to have to checkpoint in the first place. We have a
number of other instrumentation strategies, some involving timers and that is in the
paper, but I think these two should give you a sense of how we are going to decide where
and when we are allowed to have checkpoints.
In other words we are not going to do a checkpoint right in the middle, because we don't
think that is logical. The next thing we did is we simplified the problem of knowing--let
me back up a bit. That was static time. That was compile time where we decide where to
put potential checkpoints. At runtime we need to know okay, we have entered an area
where we could do a checkpoint, should we actually do a checkpoint and to do that we
pick a threshold and if our voltage supply has dipped below this threshold, we say all
right, let's do that checkpoint. So as we get to death, this is a happy talk [laughter].
Imagine you have your checkpoint. And don't forget it is not instantaneous. Your
checkpoint does take time. You could checkpoint early. You could be very conservative.
Maybe you are a fiscal conservative and you will do this. But then you will find that you
are just wasting energy, because if you think about it for a moment, if you are
consistently checkpointing early, there is that little tail and what that means is every
single time you run your program you are going to have a much shorter run, and you are
going to end up having many, many more runs and so you are going to be paying a lot
more energy in your restoration checkpoints, because you will be doing more of them.
On the other hand, if you were to checkpoint a little bit too late, like my little brother,
then you are probably just going to have to start over or give it to your older brother the
processor to finish. So you don't want to miss that final point. So really what you want is
this Goldilocks checkpoint area. You want to do it right before there. You do this sort of
flatlining right before you die, have the checkpoint happen. And it is pretty tricky to
locate but if you make some assumptions about the how your checkpointing process
works, you can still pick a B threshold on the conservative side such that it is very
unlikely to die without having checkpointed. Yes?
>>: So you are assuming and I assume you are probably correct almost all the time with
the checkpoint is really expensive, but if the checkpoint were really cheap you could just
do it all the time.
>> Kevin Fu: If checkpoints were cheap, we would do it all the time.
>>: Couldn't your code detect though, for this particular program the checkpoint is just
[inaudible] this tiny CRC is…
>> Kevin Fu: That is correct. There is still a huge amount of optimization that I think
can still be done. For instance, later on I will--I would just say now that it would be nice
to have more dynamic understanding of the program. So right now we just assume sort
of a worse case scenario; we checkpoint all registers and all RAM because RAM is small.
The earlier version only had 256 bytes of RAM, so we would just checkpoint everything.
But as RAM grows on these devices, we are going to have to be smarter, think about
compression, think about more of those interesting things. But writing to flash memory
is a very expensive operation on this device, because you have to jack the voltage up in
order for it to run well.
>>: Have you tried choosing [inaudible] dynamically so you lower until those things
misbehave in and you make it up until your [inaudible]?
>> Kevin Fu: We haven't done anything dynamic yet. I think that is the next step. Right
now, we have said we need to simplify the problem so we can at least make it solvable.
Oh, one more question.
>>: So you have this B threshold set and then you probably have a monetary unit that
monitors [inaudible] as soon as it gets [inaudible] does that. So what about the overhead
of the monitoring…
>> Kevin Fu: I do have a slide that talks about the cost of the monitoring. If you can
wait it will be probably about 10 slides later. But there is a significant cost to monitoring
so you also don't want to do that just all of the time. So already you can see it is pretty
hard to do. Let me just tell you about some other things that make it even harder. Our
flash memory has no flash translation layer. We have to manage the whole thing on our
own. So oh, you filled up your flash and you need to erase a segment of flash before you
can write to just one word of it. You've got to do that yourself. And by the way, your
checkpointing your register so you can't use any registers to make decisions. It is kind of
circular thinking, right? It is really hard to write code, when you are trying to checkpoint
the system itself. And you can't just overwrite arbitrary things in flash memory. That is
not how it generally behaves. It is very painful. I feel sorry for Ben because this was a
huge amount of development. Let me just tell you about some of the debugging. I
believe he was saying, and correct me if I am wrong, I believe he told me he actually
implemented Manchester encoding over the LED for debugging, because it sort of to me
is equivalent to PrintDef debugging. It is really hard to debug these things when there is
no user interface. And moreover, if you touch it with a wire you might actually disturb
its performance. So it is very hard to work with these devices.
Fickle harvesting, those first three slides I showed you, these three voltage traces. It is
completely not reproducible. If you run an experiment and have a colleague at another
university replicate it, forget it, because it is a different humidity over there. So that is
why we focus on making these reproducible.
>>: You can't use a register to make decisions, how do you make decisions?
>> Kevin Fu: You just need to be careful, [laughter] very careful, and this means you
lose a lot of the abstractions that we take for granted. So to bring back a little bit of
reproducibility and science to all of this we decided we needed to have a trace driven
simulator. If we couldn't get people to reproduce the results, then what is the point? So
this is based on a tool called MSPsim. It has already been out there, Ben and others
augmented it to actually capture notions of energy. Literally what he did is he took a
scalpel, cut off the front end of our little device. The energy harvesting front end would
measure the voltage coming out of it, and then that voltage would be fed into our
software, the simulated capacitor, so it is sort of first-order physics approximation of how
capacitors behave under different voltages. And then we throw that into the simulator
and this causes the simulator to have a notion of when to fail, went to reboot when there
is not enough voltage.
Again it is sort of physics 101. There are all sorts of weird nonlinearities when you are
dealing with capacitors. But we did validate this. We actually measured the N 10 power
consumption, energy consumption rather in order to verify whether our simulator was
pretty good. And you can see the paper if you want to know more about that accuracy.
You've got this program that our handy dandy Mementos edition has instrumented with
energy checks. Now we would like to see how it actually performs in the simulator. So
the simulator takes its input two things, the executable, the instrumented executable, and
a voltage trace. This allows it to be reproducible because you can give the voltage trace
to somebody else and they can go run it. The output is a little more complicated. The
simulator is going to tell you in the ideal world here is how many reboots it is going to
take to finish. Here's how many CPU cycles and total wall time and also an execution
trace so you can go back and see why things fail. To give you an idea of what it looks
like, here is a real trace that has been annotated a bit with illustration. The blue down
there on the bottom represents the energy harvesting front end. When we harvest energy
it is fairly low voltage and that we use some special circuitry; it is called a charge pump
in order to get the higher voltage and we put that into that 10 µF capacitor. So you can
see it takes about 3000 µs until we get up to about 3.4 volts. Suddenly our hardware
turns on and starts to compute. And we are sliding down that voltage. As we compute,
the voltage on the capacitor is going down until we fail at the red mark, where we do a
checkpoint, right before failure. And then that little green dot up there represents a
restoration. So you can see it takes many reboot cycles to get this CRC check done.
The simulation is great, but what would be better would be if our simulator could tell us
not only how our program performs but how good could it perform. So during the
rebuttal process we decided we needed to implement what we called an Oracle. It
essentially, with certain assumptions about the workload, it essentially finds the optimal
way you could ever possibly hope to do checkpointing. It does not necessarily imply that
you could reach this optimal. It doesn't even tell you the strategy to do, but it says here is
the best thing any strategy could do. You can think of it as sort of a binary search on our
threshold voltage. So we are assuming that the only knob we have is the threshold and
with this knob we need to know where to turn it, and we need to pick that knob to be
static because we don't yet have anything dynamic control.
So the first thing we do is we run the program at a very low threshold, and of course we
are going to die. And we keep bringing that threshold up until we don't die and then we
think oh, let's see if we can inch it back a little bit more and so we do binary search on
those little deltas until we can finally zero in on what we consider to be a pretty good
threshold voltage where we don't expect to get any closer to death and still survive. So
that is how our Oracle works. The Oracle then outputs what was this running time. So
we can use that as sort of a benchmark of how good are we.
How did we actually do this? At the high level, again, Mementos is splitting these
computations into chunks. At compile time it tells you where the edges are of your plate
tectonic computations, and then at runtime it tells you sort of which ones are actually
going to cause interesting events. And we are using the CRC example that we have also
done some elliptic curve, RSA, but CRC is sort of the simplest example to demonstrate
how the thing works. Unfortunately there is not a lot of other code out there to compare
against. Nobody else wanted to go through this pain. I can understand why. So what we
do is we compare against our checkpoint Oracle. So at least we know how close we are
to ideal.
To give you an idea of just the scale, Mementos itself is only about 2 kB. And one
kilobyte of our flash memory is reserved for the checkpoints themselves. We have
basically banks of checkpoints and it is kind of tricky to swap between them, because we
don't want to erase a checkpoint before we know our next checkpoint is actually set. It is
kind of tricky to do when you don't have full control of your registers. This is answering
your question about why don't you just constantly monitor. To do a monitoring on the
voltage supply takes about .1 milliseconds. This is mostly because of the analog digital
converter. This is a fairly significant cost. It translates into a large energy cost. It is
pretty close to a flash write, so we don't want to do it all the time. But then the program
in itself only takes about 4 milliseconds and the restoration two milliseconds.
One other thing to point out. The question we often get is why not just use TinyOS and
believe me we wanted to use TinyOS. TinyOS is this operating system, library rather, for
sensor modes, and the problem we ran into as it took about 100 milliseconds to boot
TinyOS, isn't that right? And that was after you optimized it, right? So if you optimize
TinyOS you can get it to boot up in about 100 milliseconds, and we needed two. We
much preferred two over 100, otherwise we would never get anything done. We would
boot up and we would say goodbye.
So let's take a look at this CRC test case, and we ran it through our Oracle using this
energy trace, so we just randomly picked this guy. In the paper we have I think on the
order of 10 different traces to exercise different characteristics. And the Oracle told us
oh, it will take you about 4 seconds and the best you could ever hope for would be 14
reboots and, by the way, we recommend based on our Oracle pickup threshold of 2.35
Volts. Just pick that, again not dynamic, static and that's what you should get. So we ran
this against the simulator with our actual checkpoint strategy, because we can't just pick
any strategy, we have to pick something the compiler knows how to do, and we were able
to finish with about 16 reboots in slightly more than 6 milliseconds and that was with the
threshold of 2.6 volts.
It's not as good as ideal, but it is getting pretty close and that is sort of where we are
today. And I can just point out that this is, so sometimes you read a systems paper and
you ask is this a paper about improving performance, or is it about better semantics? And
this is definitely not about better performance. Here it is about semantics. We are trying
to get programs to actually run. The status quo is programs don't run. So at least we get
them to run, and the penalty is about 10X is pretty typical, but at least we can get them to
complete, if there is a pathway to completion. So you have already seen the one with
Mementos, if you don't have any kind of checkpointing strategy, you can see you got
these Sisyphean tasks, you roll your computations up the hill and you keep dropping it
and you will never finish.
There is tons of related work in this space, in RFID scale devices; especially I will point
you to a paper by Michael Buettner at University of Washington, Dewdrop at NSDI. He
has a different approach to dealing with energy constraints. They essentially have a
scheduler and the scheduler will look at your energy levels and decide which tasks to run
based on how much energy is left. So if you are very low on energy, you might only do a
quick task. In our work, we are thinking that we have a single task, so we don't really
have a choice to do any scheduling. And again checkpointing has a rich history. I
usually think of things like Condor from I think it was the early ‘80s or maybe it was the
early ‘90s, but from Wisconsin on grid computing, because if you want to do migration
of your computation you need to checkpoint it. So there are a lot of interesting things
that you can borrow from the checkpointing literature of yore.
As Jay mentioned, why not do dynamic? We think it would be really cool to have some
kind of dynamic decision making. We just kind of chose to solve a simpler problem first.
It would be a great extension and I would hope that we could do better, but it is not
obvious to me if dynamic would actually make it better, because anytime you need to
make decisions it is going to cost you CPU and it might actually make it worse if you
miss the deadline, so it is not clear if it will work. We are looking into all kinds of other
volatile memories, phase change, ferroelectric RAM some of these might make it a lot
easier to do. And it is still is yet to be seen. We originally were doing some compressed
checkpoints. But as you can imagine, if you don't have full access to your registers, it is
really hard to write compression code. So we decided not to do that. Actually I think
Ben said please, do I have to do this? And I said okay, we will wait until the next time.
And we do have a platform now that we are selling. We are selling this guy for research
purposes. There is my model again. Thank you, Jeremy [inaudible]. If you would like
to play with that or have one, come talk to me later.
Oh wow, check point of my first half of the talk. It is a good loop latch. Hopefully I am
not going to repeat. Should I just start over [laughter]? So Mementos is all about energy
aware checkpoints to help these batteryless devices get long-running things done. It
won't necessarily get done quickly but it will at least get done. And it is using the LLVM
compiler optimization passes to make this possible, trace driven simulators so that you
don't have to do it on hardware and find out that it doesn't work anyway. You can do it
on software first before you get into the nitty-gritty of circuits. And the applications,
there are all kinds of fun little things. We used to think of crypto and implantable
medical devices, but I am working with one evolutionary biologist who wants to use
some of this for insect scale tracking, so we recently were talking to experts from Cornell
who are micro-surgeons for crickets, to be able to implant these inside crickets for
tracking. But I am not sure if that is going to work out. It would be interesting if it does.
So that is a checkpoint, but I think there is a question before I…
>>: I just want to clarify something you said, it sounds like as far as the [inaudible] goes
there's a lot of work to compute the optimal perfect [inaudible] voltage and then add 10%
[inaudible]
>> Kevin Fu: Roughly. And there are other constraints such as you can't just buy a
voltage regulator for any old voltage you want. There are certain manufacturing
constraints, discrete choices available. So any other questions before moving forward?
So the next thing I am going to do is talk about some work that Negin lead over the last
couple of years and it all relates to getting flash memory to work well saving potentially
of upwards 30 to 50% of the energy by basically violating abstractions. So we are going
to violate abstractions with glee and try to make things still work. Big data, so here is the
typical picture that you see of a data center and you think about massive amounts of
storage and making those run using fewer joules of energy. The kinds of things we are
going to care about though are if you take a look at our doctored photo, up there in the
ceiling, you see a smoke detector. And that smoke detector has flash memory inside of it.
Does anyone know why they smoke detector has memory?
>>: Insurance companies?
>> Kevin Fu: Yes. Why do you think the insurance company, actually, it is to use
against insurance companies. Now let me explain. I recently went to a tear down of
these devices and the manufacturers said that the reason that there is flash memory in
their smoke detector is that, let's say your house burns down. And your insurance agent
or your insurance company goes to the manufacturer saying well, obviously your smoke
detector failed so we are going to sue you. Smoke detector company says let me see that
smoke detector. Let me hit a few buttons, take a look. Oh, according to the flash
memory, you never hit the test button. That means you obviously are an irresponsible
homeowner and I think you are the one liable. So they actually use the flash memory
against the insurance companies and against the homeowners for liability, so it is for
legal reasons.
Anyway, there is flash memory inside there. And there is flash memory in a bunch of
other very low power typically battery-powered devices; I like to call them boring high
volume devices. So thermostats, smoke detectors, implantable devices are a little more
exciting, toll way transponders, kind of boring stuff. But there is a lot of it, so there is
over 10 billion in sales of these kinds of things each year in the US alone. Now there is
also flash memory in a thing called solid-state drives, and I know many people are
excited about that. We are talking about things much deeper on a smaller scale. Some of
these things I'm going to talk about might apply to solid-state drives in the future, but
right now the price points are not very good to making this applicable to solid-state
drives.
So let's focus in on the smoke detector. If you crack it open you're going to find a
microcontroller. This one happens to be from Microchip and it has 8 kB of embedded
flash memory. And the interesting thing in the spec for the microcontroller is that it tells
you that you can run that say 2.2 volts or you can run at 4.5 volts and this gets interesting
from a power consumption perspective because power consumption is proportional v
squared. So if you run the thing at a high-voltage it is not just linearally worse; it is
quadratically worse. So let's think about how they manufacturer would use this in
deciding which voltage to set. In the ideal world, you can imagine you have a single chip
that contains your CPU and your flash memory, and it has two pins, the two different
voltages. This allows you to scale your energy proportional with the workload. If you
are not using your flash memory, you just don't bother using the 4.5 volts. If you are
using the flash memory all the time okay, sure you are going to burn up a lot of energy,
but you can dynamically scale it. Unfortunately in the real world, what happens is energy
is proportional to the worst-case, no matter what you are doing. And the reason why is it
is very expensive to put extra pins on a chip. So there is typically just one pin; you get to
pick one voltage to rule them all and so if your CPU needs a low-voltage to run and your
flash needs a really high-voltage, the manufacturer is going to pick the high-voltage
whether or not they use the flash memory, they are going to set it to the high-voltage
because they might need to use the flash memory.
So we are going to try to make these chips behave more in the energy proportional world
even though we are actually running in a world where we have to pick a single voltage to
work for everybody. And we are going to do that by violating specs and abstractions but
still bringing back reliability of our storage. So we are going to try to recover that wasted
energy. Our approach is strictly software-based, so the cool thing is you can make some
software abstractions and how you interconnect with the microcontroller and it has this
strange interplay with the physics of flash memory such that you can then save power. It
makes no sense why it should save power strictly from a software perspective, but if you
think of a bigger system, we will show you soon about why we can save power.
Okay. A little bit more about the state-of-the-art. I already told you sort of the way
things are done today is the manufacturer will pick the highest voltage because it is just
the easiest thing to do but then your device won't last as long. It will use up the battery
more quickly. And the other approach, if you have a lot of money, you just at hardware,
so you could do sort of a poor man's discrete voltage scaling, that is you could on the fly
change your voltage. Unfortunately, this does require circuitry. And in high volume low
profit margin devices they really don't want to spend $.10 on even a couple of transistors.
So this is something that is generally avoided. And then I like to poke fun at Josh Smith.
He was at Intel, now he is at University of Washington. Our original motivation for this
was he gave us one of his WISP tags, this programmable RFID, and we were going to
write to flash memory and he said oh, you can't write to flash. There is flash memory
there but you can't write to it because we set the voltage too low. So they literally just
disabled the flash memory, and we wanted to get it to work so we wrote to it at the
voltage we had not the voltage we wanted.
And this picture should start to look pretty familiar. Okay. So the basic idea here is your
standard engineering trade-off. We are going to run the thing at a very low-voltage. We
might get some errors. And we are going to need an error correction routine and we need
to know well, is that error correction going to cost so much that it makes the savings not
worthwhile, or not. Now the good news is, I will give away the ending, we can actually
save more than we need to spend to do the correction on almost all workloads that we
have encountered. So let's go back to a little more inspiration. Who used a punchcard
[laughter]? Are you proud of it?
So in 1982 Revestian Shamir had this cute paper they called Wits. They called them
write once bits. So this was back in the day where write once memories were fairly
common, woms, for instance. You could flip a bit once in a direction, but you could not
flip the bit backwards. So was very much like a punchcard, but electronic. And they
figured out a clever coding scheme where they can make the memory rewritable by
actually having the next write be sort of a superset of the previous write and they would
trade-off capacity with rewrite capability.
So we took a look at that work and we thought well, what if we sort of write to these
write once bits sort of halfway, so this is where our halfwits work comes in. And let me
tell you about just one example of how you might use this approach to write to halfwits.
So we are using NOR flash memory and in NOR flash memory, all bits are born as one.
They are all equal. And let's say you want to write a value to that memory where the two
least significant bits are zero, seems like a pretty simple operation, but if you are doing it
at low-voltage, you might get some errors. The bit flip might not actually happen. So
imagine that you do the write and then the least significant bit does not flip. That is in
error and we will have to correct for that. That is the bad news.
The good news is this is what you call a Z Channel. It has these asymmetric probabilities
of failures, so if you are trying to write a zero, there is a probability P that it will fail, but
if you are trying to write a one it is never going to fail, because they are born as ones. So
we are going to be able to exploit some of these asymmetries to get better energy
consumption.
>>: Is this independent [inaudible] the same bits?
>> Kevin Fu: No. This is where it gets interesting. So there are a whole bunch of
different factors that you may want to consider. And so Negin did all of the experiments
on this and all of the experimental setups. So we considered a bunch of different things.
The ones at the bottom I am not going to talk about, because we have found no evidence
that they actually influence the probability of error, but we will talk about the first three
factors that influence errors before we get into our approach. We got one obligatory eye
candy picture that you always need one of these in any talk involving electronics. We
now have a digital power supply; this is kind of error-prone, right? If you accidentally
bump it, the voltage changes. But Negin can tell you more about this set up. This is her
beautiful set up here. And we are running at 1.813 volts at the moment and we are testing
back platform with a second device to monitor how well it is performing and actually
how much energy it is using.
So let's get to the data on just how the voltage affects the error rate. It should be obvious
that we are going to get errors. The question is what does that error distribution look
like? So here is a chip that was designed to work at 2.8 volts, I believe.
>>: [inaudible]
>> Kevin Fu: 2.2 volts, sorry. And you can see as you are edging your way down, there
is sort of this cliff where suddenly a whole bunch of errors pop up. And so this was in
one microcontroller. There is a second microcontroller that we had that was the exact
same model and it had a radically different cliff. So there the errors started around 1.92
volts. So variability again comes into the picture because of process variation. If you
start looking at other microcontrollers, of different families of microcontrollers from the
same manufacturer, you're going to get different curves, different times of failure. So all
we know is that there will exist a cliff. We don't know exactly where that cliff is but we
do know that there are pretty big safety margins built into the manufacturing, so anything
above the cliff is wasted energy, basically.
One other thing that I consider sort of just a confirmation of what we would expect is that
Hamming weight is directly influencing our probability of error. So basically if you have
a lot of zeros, you are going to have a higher probability of error, and it just makes sense
because flipping from a one to a zero, there is a probability of error, not flipping or
writing a one involves no change so there should be no error.
>>: Are these [inaudible] bits or for the whole byte or whatever?
>> Kevin Fu: For the whole byte, so this is the Hamming weight of how many zeros are
in it. So if we have, excuse me, how many ones are in it? So if we have all zeros, all
eight zeros in a byte, it looks like maybe 80 near 90% chance of some error in the byte.
In the back?
>>: Do you assume you are [inaudible] structure [inaudible] or something? Do you only
write to the end of it? You never change…
>> Kevin Fu: We have no file system. So this is raw access. There is no flash
translation.
>>: You're not going to go back and change [inaudible]?
>> Kevin Fu: We are going to do something very much like that. This is just
observational data, trying to figure out how the raw flash data behaves in strange voltage
conditions or strange conditions. This happened to be at 1.9 volts, 1.8 volts. Another
factor to worry about is wearout. I always thought that as you wear out your flash
memory, I always thought that it just got crappier. But I guess that is not a scientific
term. So what Negin has done here is we had three different areas of flash memory
blocks. It is very small. These microcontrollers do not have a lot of memory. And we
artificially wore out through several read, erase write cycles the midsection, as that is
more often used. And then we wrote a bit pattern and we checked to see what kind of
error rate we were getting. So what is interesting is so the lighter color indicates high
errors and what is interesting to me is that the midsection actually has fewer errors, so
this tells you that the more you wear out your flash memory, the less likely you are going
to get an error. It was counterintuitive to me at first, but then I realized it makes perfect
sense, because what does wearout really mean? It means that you are changing the bias
from bias to one to bias to zero. So the more your flash memory is wearing out, the more
likely, or the easier it is to get it to change to the zero state. It also means that it is harder
to get back to the one state. So that is why operations get harder. But this is to me it was
counter intuitive.
>>: [inaudible]
>> Kevin Fu: Sure.
>>: What is the linear [inaudible] binomial because of the more [inaudible]?
>> Kevin Fu: Megan, do you want to take a stab at that?
>> Megan: In each of those you have six zeros or seven zeros so [inaudible].
>>: But it seems like, maybe I am misinterpreting error rate. Isn't it just a probability
that at least one [inaudible] did this wrong? So if you have two bits, you could be wrong
and then double and then shouldn't it be exponential, because the probability that they are
all, the probability that at least one fails, well I guess that [inaudible]
>> Kevin Fu: I will have to think about that. I am not sure. The other problem is you
could have an exponential and we might just be in the portion that looks linear. But to
me it just sort of confirms yeah, the more zeros, the more errors.
>>: [inaudible] upside down approaching one hundred percent error rate [laughter].
>> Kevin Fu: All sorts of fun and tricks you can do with graphs. Well, there is one other
property I want to talk about and that is sort of the most important one and that is I did
not know at first that flash memory has an accumulative behavior. The way flash
memory works is that it represents zeros and ones by how much charge is in the cell.
And so if you attempt to do a write to flash memory and you fail to actually flip a bit, you
have actually partially succeeded. You got a little bit of charge to go into that flash cell
and so the next time you try to do a write, it is going to be a lot easier to do, because there
is already a little bit of charge there. And that is the primary property we are going to
exploit with our software abstractions to reduce the power consumption to let things run
at a very low voltage, and to build a coping mechanisms for when we do accidentally run
into an error.
So I am going to skip a little bit forward to, I don't think we need to talk too much about
the model to understand how this is going to work. But I will just talk about some of the
design goals. We want the energy consumption, we want to minimize that. We also
want to minimize error rate, and delay is also important, because if you don't have a
predictable amount of time before your data can be written, it makes it a lot harder for the
system developer. Of course, you might not be able to get all three at once, but let's talk
about three different approaches and how they affect these different minimizations.
So we tried three different approaches to deal with errors at low-voltage. The first one
we called in place writes. This is where you can imagine you do a write to some location
and you just have a feedback loop. You keep writing to it until it sticks. The second one
multiple place writes is where writing a value to one location and then we write the same
value to a second location. Now there might be an error, but because the errors are these
Z Channel kinds of errors, we never get errors, if we ever see a zero, we know that it is a
proper zero. If we ever see a one, we don't really know what it is. So we can do a logical
and of all of these locations at read time to then figure out what was the value. Then the
third approach was more traditional. It is using error correction codes, Reed Solomon
Berger codes, so throwing computation at the problem rather than throwing just writes to
memory at the problem. Yes?
>>: Quick question. At this point you are assuming that [inaudible] consumes less
energy than one write in high-voltage, right?
>> Kevin Fu: It's all about the workload.
>>: This is something that I should hope for, right, because of two writes in low-voltage
consumes more power than [inaudible].
>> Kevin Fu: It's actually slightly more complicated than that. So what you are pointing
out is what happens if our error recovery mechanism is really, costs a lot of energy all the
time. That would be bad. So the good news is errors are pretty rare. So we don't usually
need to use our error correction mechanisms, but when we do need to use the error
correction mechanism, the question is well, is it a small reasonable cost or is it such a
huge cost that it is going to make the whole process pointless from the beginning, and
that depends on the workload. It is going to depend on whether you are write bound or
something else bound. But the good news is most workloads do actually work out for
this kind of scheme.
>>: Isn't it also the case that sometimes you don't have a choice? You are working with
low-voltage and [inaudible]
>> Kevin Fu: Exactly. Thanks for bringing up that point. So there are two different
regimes you can think about for low-power. There is low-power because you are trying
to save power and you have chosen to set a constant low-voltage to make your device last
longer, but the other regime is that you've got this energy harvesting device and you have
no control over your voltage and you have to live with whatever you have got. And that's
where we started from. We were given a low-voltage and we could either choose not to
save anything. In fact we went so far as to actually store things remotely over the air
because it was cheaper to do that than to bring the voltage up higher to write the flash
memory, so some really weird things happen in low-voltage, some very strange design
decisions.
So now comes to the part where we have beer [laughter]. So we are going to represent
our bits as these beer mugs using negative logic. So here is a beer mug initialized to zero
and here is a beer mug initialized to zero. You can see it has lots of charge inside of it.
And we are going to use this to illustrate how these three different schemes work. So for
the in place writes, again, we are going to do these repeated writes, so we start with our
memory initialized to one and we want to make it into a zero. So we are going to add
charge, but at a low-voltage. So it is slowly dribbling out and our bit tender is putting
some beer in there. And it might take one or two or maybe three attempts to actually get
enough charge in there before it would be interpreted as a logical zero. So this is sort of
bending the notion of analog and digital. We are kind of doing hybrid digital analog all
at once in order to save power. Now this took three attempts. That is not very ideal.
That means it would take three times as long to do a write. Question?
>>: [inaudible] because it comes to [inaudible] it is a zero writing, does that mean that it
would also be zero 10 seconds later when you read it again?
>> Kevin Fu: That is a good question. So we have yet to find any errors of that nature
where it has flipped to a zero and then flipped back on its own. Now that is not to say it
cannot happen. I am pretty sure there is some way that could happen, but the only good
news is that we have not encountered it.
With that kind of scheme in mind the sort of feedback loop, keep writing, rewriting into
you get the bit you want, you can imagine so we got on the X axis the number of
potential rewrites and then the error rate after that number of rewrites. If this approach is
to work, we would sort of expect a quick drop-off to a very low error rate after just a few
writes. If it does not work, well, we would see no change. So the good news is that all
sorts of different voltages, you very quickly see a reduction in your error rate. So already
at 1.87 volts, if you do for in place writes you are going to eliminate pretty much all of
the errors. So you can actually just have a static routine with no feedback that just does
four writes and you would virtually be certain to have no errors. If you are willing to
have slightly higher voltages, which means you would not get the same energy savings,
you can get much higher reliability, so here after just one write, at 1.9 volts we didn't get
any errors detected in these many trials. So there you can kind of think of this feedback
loop as just an insurance policy that we never needed to execute. That is the ideal case.
We get really, really close to that wall where error starts to happen, but we never actually
hit the wall. So that is our best case scenario.
The other approach is we call multiple place writes. So this is in collaboration with
Andrew Jang who has been here before to speak on rank modulation and multilevel flash
cells. So here we are going to show how to do multiple places. So imagine you have two
different bit locations, and in order to do writes you have to write to two locations. So
here you can see that we have written a one there and a zero at the bottom and because
we did not actually get enough charge to put into the Texas mug, so I will make fun of
Texas [laughter] and then all you have to do at read time is due a logical and. Because of
this asymmetrical error you do the and, and you will get the actual value. So as long as
one of the bits was written correctly, you will read the correct value. We thought that this
would work a lot better. But it was nowhere as near as competitive as the in place writes
and my intuition on why this is the case is this particular mechanism does not exploit the
accumulative behavior of flash memory and so the accumulative behavior really helps out
the in place writes but it does not help multiple place writes, unless you combine them.
>>: It also seems like the flash memory is more homogenous [inaudible] it surprised me
>> Kevin Fu: Yes. I would've expected to see much more variation, but none was
detected, nothing significant. I am not even going to talk about the error correction code
read Sullenberger codes because on the order of magnitude they were two worse. It just
turns out that computation, there is so much computation involved, that basically you had
to pay an upfront fee to do the error correction before the first write and you might not
never need that error correction, and therefore you are always paying this penalty whether
or not you had an error, so we found that this feedback loop was much more effective
because we only paid for an error when there was an error.
I should move on since I am running out of time. We'll just do a couple of comparisons.
So those were, well just to explain the difference between in place and multiple place
writes in terms of time and energy consumption, they were pretty close and because they
are so close it is almost obvious that you want to use in place writes because it uses less
space. So if you were to use multiple place writes you would have a minimum of a 2X
increase in how much flash memory you would need to use to get the same job done. So
at the moment we see no reason to use anything other than in place writes.
Moving on. So micro-benchmarks. Let's compare doing these in place writes at a lowvoltage versus just the standard approach if we were running at a high voltage. So there
are no real benchmarks for this yet, so we had to come up with our own. So we are going
to think about things that are sort of read intents and things that are write intents and
combine them. So we've got four different environments that we are going to run in. The
bluish colors represent our experiment, our in place writes at different voltages, and the
yellow orangeish ones represent the standard worlds, different microcontrollers have
different thresholds for writing to flash. So the first thing we are doing is the RC five
block cipher which obviously doesn't have a lot of storage; it is mostly computation. So
you can see that if you were to pay for the ability to have the potential to write to flash,
you are proportional with the worst-case scenario of energy consumption. This is
normalized energy consumption. And if you were to make it sort of more opportunistic
using halfwits you can save 30 to 50% just on your energy consumption. The difference
between the darker and lighter color is just how much reliability. The darker one, there is
a potential for getting an error, small potential. And then the lighter one we have yet to
find any errors, so it depends if you would accept having any kind of write errors.
The retrieve workload is very similar because there are no similar writes going on there.
Again we make it proportional to the workload. The bad news is storing is sort of the
worst thing that you can do. So I think that this was related to the question about what if
it takes more energy to store them. Yes. It takes more energy to store using this
technique. But I claim, so what. So if you had a right bound workload, every single
cycle was a write to flash memory, yeah, you would see about 50%, you know, it would
be twice as bad as using the original scheme. The good news is that most workloads are
not of that nature. Your smoke detector is doing other things than writing to flash
memory. And, in fact, one manufacturer we are working with gave us a device where
they do a one byte, I believe it is a one byte write flash about every 5 seconds so they
were talking a huge ratio in terms of how much non-writing they do, but they pay for
writing the whole time.
So the hypothesis is that this low-voltage writes and the halfwits regime would be much
better than the high-voltage system for typical workloads. I will just give you one
workload and that is of a typical sensor networking monitoring application. So here we
are going to be read 256 bytes of accelerometer data and we are going to compute some
rather simple statistics and then store that aggregated information to flash memory. And
keeping it pretty high level, we were able to get about a 34% energy savings and again it
is because we exploited this accumulative property of the flash memory and we were able
to drive their voltage much lower, get pretty close to the actual safety margin rather than
some artificial spec that sets a very high safety margin for writing to flash memory.
Here are some raw numbers just to let you think about how things are actually working.
If you were working in the standard model at 2.2 volts to do that workload I just talked
about, we measured 410 micro-jewels. This was measured by looking at the drop on a
capacitor. But if you were to use our technique for in place writes at 1.9 volts we have
yet to encounter an error in that regime and the energy consumption was only 300 microjewels, so a pretty significant savings with no penalty for errors. Now let's say if you
take the approach of errors are okay. There are many applications that are okay with
errors. Think about like UDP for storage. If you don't mind dropping down to 1.8 volts
you might get an error after two writes, and if that is okay, every now and then you get an
error, you can still have additional savings down to 270 micro-jewels. But for this reason
if you do care about having perfect data, we found at least for this microcontroller 1.9
volts is sort of a sweet spot to get things to work well enough before you hit the curve.
Okay. There are all types of other little improvements you can make. One is kind of fun.
If you have data that is always filled with a bunch of zeros, well maybe you should flip
all of your bits. So if you have a sign bit, just store the complement and it will actually
take less energy. And it will take less energy because you are less likely to have errors.
This comes up in EKG data, for instance. If you are storing medical data, you are going
to see a lot of errors. And you can also do compression, things like memory mapping
tables to basically have codes to reduce how many zeros you need to write. So to
summarize this approach for flash memory low-voltage, it is this regime where we are
trying to make our energy consumption proportional to the workload rather than the
worst-case. Microcontrollers today are basically being abused for cost and
manufacturing reasons. They run at a much higher voltage than necessary. Our
abstractions are forcing us to do this, so we basically violate the digital attraction in order
to save energy and that we are able to actually bring back some of that reliability through
these error correction processes.
And we are actually considering some commercialization of this. I will just say that our
commercial version of this we call Smash Memory, it is smarter flash, sounds a lot better
than flarter [laughter]. But you can start to think about if you use this halfwits regime
you can actually reduce your manufacturing cost, because you can now live with
components that are higher variable. Oh, you had a resistor that was 5% versus 10%
variable, well now you can use the ones that are crappier and still get away with it,
smaller batteries and you can make the device smaller, weigh less and also you can make
it greener, because there are fewer chemicals going into the environment as you are
scattering these things around. And I think I will just leave with there are a lot of devices
out there that could benefit from this, so we just, I had Negin working with a number of
undergraduates and we called it basically looking for 100 microcontroller products in 100
days and we did all sorts of power measurements to figure out what the microcontrollers
were doing and what they could do, and there is a lot of waste out there. I think I will
just say. Okay. So that is halfwits, oh was that a question?
>>: Some of the things that there are funny.
>>: You talk about a lot about the writes didn't touch too much on the reads [inaudible].
>> Kevin Fu: Yeah, at the moment, you know, it is just a Turing machine with an
infinite tape. So I am assuming we don't need to do the erases right now or occasionally
because we are working with energy harvesting devices mainly, we assume that
occasionally we will have these happy power seasons where we do have a lot of energy
and we can do sort of an erase of everything that is not being used. That may not be the
case in all applications. I don't know; Negin, do you have any comments on erase?
>> Negin: [inaudible] we never saw any errors because of erase [inaudible] we don't get
any errors [inaudible]
>>: Even as the memory wears out and gets harder to…
>> Negin: At that point yes, but during the [inaudible]
>>: And does that require a higher voltage?
>> Negin: No. Just the same.
>> Kevin Fu: I think [inaudible] in the back first?
>>: It seems to be more the frequency scaling if you run instead of [inaudible] you clock
the frequency a lot higher, you're going to finish your job much faster and save energy
that way. Any comments or are…
>> Kevin Fu: I have a couple of comments about frequency, choosing the appropriate
frequency. One is no comment, and the second one is the workload you see on these
kinds of devices are not your typical workloads that you would have on a desktop or on a
node in a data center. They tend to be doing a lot of sampling, for instance. And you
could speed up your clock for instance to get more things done whether you need that
granularity or not, is not clear.
>>: [inaudible] compute card [inaudible]
>> Kevin Fu: Typically the bottleneck is storage, how much you can store, because the
storing to flash memory is an order of magnitude worse then CPU, usually. Do you have
anything to add?
>> Negin: Yes, that's right and on another approach we call it the slow writes, and we
just slow down the rights to flash memory and the flash memory in the errors drop. It is
much better than just [inaudible].
>> Kevin Fu: So there are different ways to get the similar effects.
>>: You mentioned flash and fire alarms to settle insurance companies’ lawsuits. What
do the fire alarm lawyers say about [inaudible] conflicts [laughter]?
>> Kevin Fu: I think it is the motivation. I think it will be a struggle to actually have
that work in a smoke detector in practice, because they have to pass UL inspection and
anything that involves high criticality has huge safety margins, so for instance although
the battery is rated for X years they only actually advertise it as X over two years in order
to get their safety rating. So there could be some uphill battles on that, but I think that
this is the kind of case where there are certain things in place that don't necessarily,
certain policies in place that might not make sense, and there are trade-offs but we can
quantify those trade-offs.
>>: For a fixed reliability, suppose you were given a reliability target of say 90%, or 95
whatever, and you could set the voltage as you like, what would be the ideal? Because
my intuition is that you would just set it just above that cliff and then not do the multiple
writes.
>> Kevin Fu: Well, there are a couple of complicating factors. So first of all static or
dynamic, I will just fill that in since you asked the question earlier. I think you could do
a lot more with dynamic. So if you had errors, you could certainly, if you have control
over your voltage , you could choose a higher voltage. But if you don't have control over
your voltage, all bets are off. You don't have a choice anyway. Let's see how to address
that. I think it is going to be tricky to address because as you saw in the earlier slides
even the same chip from the same family has a very different cliff, so you could pick a
voltage, but it might be different for the other chip. So what I would recommend is that
you profile 50 or 100 chips and then just take the maximum value, and say well, that
seems to be according to process variation that is about as high as we would ever get. So
set the threshold there and that would probably be good static. You wouldn't want
anything more conservative than that. There is no need to be more conservative unless
the chip itself changes and remains the same model number. So if you are a company
making a billion widgets and you always use the same chip you could do this, unless you
are manufacturer of the microcontroller changes its behavior without telling you. Then
you would have to adjust.
Okay. I will just wrap up with a couple of other weird things we are doing in my lab. So
one thing we are doing in trying to get these embedded devices to work well, is we are
handing out implantable medical devices, not for human use. But if you are interested in
using medical devices for your research, come and talk to us. We are loaning these out.
They are from patients who did not need them anymore [laughter]. So for your
homework besides that you can start to think about all interesting problems in this space
of very, very low power computing. There are interesting hardware issues like what can
you actually move off of the chip to save energy. There are some interesting circuit too
problems or just memory choices. There are also all sorts of interesting options available.
Most of us use flash memory today, but phase change memory and ferroelectric RAM
and all of these other technologies are coming along with different kinds of error
properties. Some we believe that may work well with this halfwits kind of model. And
then on the OS and the PL side, there are lots of fun things to do like with task
scheduling, compile time optimizations; I think it is sort of a fertile ground where not a
lot yet has been tried. And there are probably some really cool ideas from the ‘60s, ‘70s
and ‘80s, maybe even things that were used on punch cards that don't make sense
anymore, but might make sense now because we are going so low power.
If you want to learn more and if you don't want to keep repeating your computation, or
our computations, you can go to our website, our Roman Empire SPQR.CS.UMass.EDU
and that is all I have for today so I would be happy to take any questions now or off-line,
but thank you for coming.
[applause]
Download