>> Karin Strauss: Good afternoon. My name is... introduce Professor Kevin Fu. Kevin almost needs no introduction. ...

>> Karin Strauss: Good afternoon. My name is Karin Strauss and it is my pleasure to introduce Professor Kevin Fu. Kevin almost needs no introduction. He is a regular here at MSR, but he is an associate professor at UMass Amherst. He is an expert in security and privacy and embedded devices. He has won many awards including the TR 35 and the Sloan Fellowship and with no more ado, Kevin. >> Kevin Fu: Thank you Karin. And it is all of course because of having great students so I have Negin here today and Jeremy also has co-authored papers with me in the past. You can't have good research without great students. What is this talk about? I normally work in security and privacy, but about six years ago we ran into a problem whereby we were trying to get cryptography to run well on an RFID scale device. We actually have one here today. So my model is a programmable RFID tag, no batteries, takes all of its energy from radio waves essentially and performs computation sensing and then uploads the data. But we could not get the crypto to work well and the bottleneck was we could not store anything because the flash memory would work at low voltage. So this kind of sent us down a path of okay, let's think more about the platform and rethink abstractions because if we can't get basic things like storage to work then there is no hope for the more interesting higher layer to me issues of things like cryptography and just applications in general. So this talk is going to be covering sort of two different domains of non-volatile storage on these embedded devices, typically extremely low-power. We are not talking ARM processor; to me that's like a power plant. We're talking much lower power than ARM for this talk. This is based on two papers. First we will talk about Mementos which was published at ASPLOS earlier this year, and it is essentially a set of compiler optimizations to make programs run well on a platform like that where the platform might reboot say every 100 milliseconds because of energy loss, because it just can't get enough energy from harvesting. And then the second half of the talk will be about some research that Negin had led and it's about how to write the flash memory at voltages that the manufacturer says don't. And we will show how we can do that well and save quite a large amount of power. Feel free to interrupt with any questions. We can always go down a different pathway if you find a certain topic more interesting than another. Just a little bit of motivation and kind of to bring us back to some fundamentals, in the 1980’s Mark Weiser was talking about ubiquitous computing. This was a term he liked to use. And he would talk about that the most powerful things are those that are effectively invisible when you're using them. You could almost argue that phones are getting close to being invisible, because sometimes you forget that it is in your pocket. I am sure we are going to hear something ring today. So that is ubiquitous. But when a batteryless invisible computer comes to be, then you run into a fairly simple problem, a simple problem with profound implications and that is that the thing reboots a lot. When you take away batteries, you no longer have that safety buffer of energy, and so Mementos is our way of addressing how to get programs to run well when you reboot a lot. So we have sort of a three-step approach to getting things done and that is first we took this emerging platform which Jeremy has there. Again it is a batteryless programmable effectively an RFID tag, and we are going to try to make it more robust. So to make this programmable RFID tag more robust, we first worked on getting a compiler to do good optimization and also doing trace driven simulation, because if you just go and try to build this on hardware, you might find that you just built a wonderful piece of software that has no use. So the nice thing about simulator is you can very quickly remove things that would not have worked on hardware, and get a pretty good sense of how it would perform on hardware. So here's a picture of one of the earlier devices. This is the tag that we are starting to actually manufacture. It is based on the Intel WISP. We call this the UMass Moo only because we think it looks like a Longhorn steer, but it is a programmable RFID tag, no battery, and in fact all of its energy is stored right here in this 10 µF capacitor. And I am thinking they might want me to use this laser pointer instead of walking. This little 10 µF capacitor is the only buffer we have for energy. The whole microcontroller runs off that thing. And it gets filled up with this antenna; this dipole antenna does all of the energy harvesting from RF and it also does the digital communications, the wireless communication back to whoever is interrogating it. To give you a perspective of just how much energy that is, it is about five or six orders of magnitude less energy than you could store in a AA battery. So imagine trying to compute with five or six orders of magnitude less energy than a AA. That is what we are doing. Here are some of the problems you are going to run into immediately when you try to compute on these extremely small-scale devices. First, here is an actual trace. We removed the units to focus in on one thing. What you think of when you see this line going up and down a lot? What word might come to mind statistically? Does it look constant to you [laughter]? No. This is highly variable, so this is a voltage trace and the one thing to take away is we have a highly unpredictable, highly variable voltage supply, the exact wrong thing you would ever want to give a microprocessor or any digital system. So we're going to have to cope with that. So the coping mechanism is a small capacitor to help us smooth things out. The problem is with a small capacitor when it drains, think of it as your water tower draining, you don't have water anymore. So in this case instead of just not getting clean, we don't compute. And when we don't compute, well, we have reboots and it is typically on the order of 100 to a few hundred milliseconds before we have a reboot. So the lead graduate student on this Ben Ransford, I asked him to plot the reboots and he was, I guess he was creative that the, so he decided to plot--this is an actual plot of how often the device in Jerry's hand there was rebooting. And this was when it was directly next to a 1 Watt transmitter RFID reader, that is it is sort of getting the most power it could possibly get and yet it was still rebooting. It is hard to compute like that. And so we liken this to the idea of having a small child. So imagine you're trying to write your paper and you have a small child who's trying to get your attention and just starts throwing this beach ball [laughter] around your head. And every time the child interrupts your thoughts you forget what you are doing and you have to start over writing your paper. You're never going to get done with your paper, right? And all of the authors by the way have toddlers [laughter]. That's what restraints are for, no, um. Parallelism, I don't know. We eventually got done. But it can be hard to get done when you have constant interruptions, so we call this a Sisyphean task in the great myths of the classic era poor Sisyphus had to roll a rock up the hill, and every time he got near the top, he dropped it and he had to start over. So you can imagine your computation, you're rolling it up the hill and every time you get almost done you drop it, and you have to start all over because you lost your energy. So we are trying to avoid that. We are trying to figure out how to get these long-running tasks to complete in short bursts. So before we get into our approach, I will just give you an idea of how it is done today. This is not a new problem, but is a different way of thinking of the problem. So typically the industry will just constrain the problem so you don't have to worry about reboots. So this is a contactless smart card, and has anybody been on the London system yet? So this is the London Oyster card they use for payments and there is a pretty standard, 300 milliseconds is the golden rule. Anything you do, whether it be crypto or communication, it needs to finish in 300 milliseconds, otherwise there is sort of the exponential increase in passenger frustration. As they are swiping the card, they get angry at it and they don't understand why it is not working, so they always finish in 300 milliseconds. And if your computation takes more than 300 milliseconds, they say well, you shouldn't be doing that in the first place. So we think that is a silly idea for the general computation so Mementos is going to relax this constraint so we can make general-purpose computation feasible. Of course it may take many, many reboots. So in the typical world of energy, you are going to be given a relatively constant voltage which is, if you integrate that, that is infinite energy over time. But in the real world of, our world, these are real traces over 40 seconds each. Your voltage is going to vary. And it is going to vary quite a bit. These three different traces were of a student holding this tag harvesting energy and doing the exact same thing each of the three times, but you get a radically different voltage trace. So we have to be able to cope with this variability. So the approach to dealing with a variable voltage supply and all of these constant reboots is fairly straightforward. It is checkpointing and I think many of you are probably aware of the checkpointing work. There is tons of research on checkpointing especially in the last few decades for things like grid computing, which I guess is now called cloud computing. Here is a trace of our voltage as we are about to die. So 1.8 volts up here is where our particular microcontroller will start to fail and so that is the death point right there. And this is actually part of an exponential decay, but we are kind of zoomed in so it looks linear. But imagine that your computation is rolling along and there is some region as you're getting close to death where you think well, that is a good place to checkpoint what you are doing. So that is what we do. As the computation comes along, we checkpoint it before we die, and at that point we sort of go into a hibernation state as we are harvesting energy. When we get enough headroom and we get enough energy we can start over again, and do they restoration and then continue our computation. So that is the easy part. Question. >>: Just a quick question before you [inaudible] why don't they just increase the capacitor…? >> Kevin Fu: Good question. We tried increasing, actually, Jeremy increased the capacitor with a super capacitor. You can increase the capacity. The problem is the voltage on a capacitor is proportional to how much energy is inside of it. And basically if you put say a super cap on here, it might take it 24 hours to charge before it responds. So there is a trade-off between responsiveness, and for us we want one second or less response time, versus this sort of insurance policy of how to cope with energy outages. >>: So you need multiple capacitors instead of [inaudible]? >> Kevin Fu: You could have a bank of capacitors… >>: But it increases the cost. >> Kevin Fu: It increases the cost and complexity. So for most of our work we focus on sort of the fundamental case of a single small capacitor. And there are a lot of fun tricks that you can do to use super caps, but it always involves something like hybrid energy harvesting, adding solar panels and other energy sources is so far been we have found. Yes? >>: I have a question. To do the checkpointing, you need to know at what point in the future you need to do the checkpointing. It has to be predictable, right? But what if something happens out of the blue and you haven't accounted for in then you lose data? >> Kevin Fu: So this is precisely why, that is the hard part of the problem that we are going to try to address in the next few slides. The easy part is knowing that we have to do checkpoints. The hard point is knowing when to checkpoint and what to checkpoint. We are going to use a running example of a CRC check as our application. It might sound a little silly but believe it or not CRCs are still pretty hard to do on these platforms just because they are so low-power. So it takes just slightly more than half a second to run this code and it takes about 575,000 CPU cycles. The interesting thing to note is that we reboot about every 100 milliseconds, so we are never going to get done with this very simple CRC without at least checkpointing. And then the next question is, what are the best checkpoint strategies so we can get pretty close to some kind of minimal running time. So that is what was driving us to get the programs to run well. So let me give you an overview, a programmer’s overview of how you would use our system Mementos to make your code run. You can write your code in C, actually you can write in any language that LLVM supports. So we use the LLVM compiler instead of GCC; it's a little more modular, easier to modify. So you write your code and you will need to choose certain parameters that would, and these parameters will help you determine the efficacy of your checkpoint strategy. Not using the parameters is what's done by Mementos. So what Mementos does is it instruments your code, it's intermediate representation during the compilation stage. It instruments it with energy checks. It becomes energy aware. It has a notion of what is a capacitor and how it behaves in different conditions. And then we built a simulator, but it is not just any old simulator; it is a trace driven simulator. It actually takes excess input voltage measurements from an oscilloscope so we can see how the program would behave in a realistic scenario. It makes it reproducible so you can actually learn from it. The simulator helps us pick these parameters so that we can dump it onto hardware and see it actually run. So that is the basic idea of what Mementos is doing inside the compiler and inside the simulator. The next piece is understanding the heuristics we used to actually instrument code. So I said there were two things that we have to solve. We need to figure out when to checkpoint what to checkpoint. So this is going to answer the when to checkpoint question. So here is your code. What is a good place to checkpoint? You might choose to checkpoint at the end of a loop latch, seems to meet some intuitive sense because you are probably, you've just finished some important computation, so there's probably something to saving your registers. So that seems like a logical place to do it, or maybe after a function call return, because in traditional stack oriented machines, if you do a function call you push all of your arguments on the stack. You do your work. You come back and you pop everything off. So we would expect to have a local minima in our stack and therefore much less state to have to checkpoint in the first place. We have a number of other instrumentation strategies, some involving timers and that is in the paper, but I think these two should give you a sense of how we are going to decide where and when we are allowed to have checkpoints. In other words we are not going to do a checkpoint right in the middle, because we don't think that is logical. The next thing we did is we simplified the problem of knowing--let me back up a bit. That was static time. That was compile time where we decide where to put potential checkpoints. At runtime we need to know okay, we have entered an area where we could do a checkpoint, should we actually do a checkpoint and to do that we pick a threshold and if our voltage supply has dipped below this threshold, we say all right, let's do that checkpoint. So as we get to death, this is a happy talk [laughter]. Imagine you have your checkpoint. And don't forget it is not instantaneous. Your checkpoint does take time. You could checkpoint early. You could be very conservative. Maybe you are a fiscal conservative and you will do this. But then you will find that you are just wasting energy, because if you think about it for a moment, if you are consistently checkpointing early, there is that little tail and what that means is every single time you run your program you are going to have a much shorter run, and you are going to end up having many, many more runs and so you are going to be paying a lot more energy in your restoration checkpoints, because you will be doing more of them. On the other hand, if you were to checkpoint a little bit too late, like my little brother, then you are probably just going to have to start over or give it to your older brother the processor to finish. So you don't want to miss that final point. So really what you want is this Goldilocks checkpoint area. You want to do it right before there. You do this sort of flatlining right before you die, have the checkpoint happen. And it is pretty tricky to locate but if you make some assumptions about the how your checkpointing process works, you can still pick a B threshold on the conservative side such that it is very unlikely to die without having checkpointed. Yes? >>: So you are assuming and I assume you are probably correct almost all the time with the checkpoint is really expensive, but if the checkpoint were really cheap you could just do it all the time. >> Kevin Fu: If checkpoints were cheap, we would do it all the time. >>: Couldn't your code detect though, for this particular program the checkpoint is just [inaudible] this tiny CRC is… >> Kevin Fu: That is correct. There is still a huge amount of optimization that I think can still be done. For instance, later on I will--I would just say now that it would be nice to have more dynamic understanding of the program. So right now we just assume sort of a worse case scenario; we checkpoint all registers and all RAM because RAM is small. The earlier version only had 256 bytes of RAM, so we would just checkpoint everything. But as RAM grows on these devices, we are going to have to be smarter, think about compression, think about more of those interesting things. But writing to flash memory is a very expensive operation on this device, because you have to jack the voltage up in order for it to run well. >>: Have you tried choosing [inaudible] dynamically so you lower until those things misbehave in and you make it up until your [inaudible]? >> Kevin Fu: We haven't done anything dynamic yet. I think that is the next step. Right now, we have said we need to simplify the problem so we can at least make it solvable. Oh, one more question. >>: So you have this B threshold set and then you probably have a monetary unit that monitors [inaudible] as soon as it gets [inaudible] does that. So what about the overhead of the monitoring… >> Kevin Fu: I do have a slide that talks about the cost of the monitoring. If you can wait it will be probably about 10 slides later. But there is a significant cost to monitoring so you also don't want to do that just all of the time. So already you can see it is pretty hard to do. Let me just tell you about some other things that make it even harder. Our flash memory has no flash translation layer. We have to manage the whole thing on our own. So oh, you filled up your flash and you need to erase a segment of flash before you can write to just one word of it. You've got to do that yourself. And by the way, your checkpointing your register so you can't use any registers to make decisions. It is kind of circular thinking, right? It is really hard to write code, when you are trying to checkpoint the system itself. And you can't just overwrite arbitrary things in flash memory. That is not how it generally behaves. It is very painful. I feel sorry for Ben because this was a huge amount of development. Let me just tell you about some of the debugging. I believe he was saying, and correct me if I am wrong, I believe he told me he actually implemented Manchester encoding over the LED for debugging, because it sort of to me is equivalent to PrintDef debugging. It is really hard to debug these things when there is no user interface. And moreover, if you touch it with a wire you might actually disturb its performance. So it is very hard to work with these devices. Fickle harvesting, those first three slides I showed you, these three voltage traces. It is completely not reproducible. If you run an experiment and have a colleague at another university replicate it, forget it, because it is a different humidity over there. So that is why we focus on making these reproducible. >>: You can't use a register to make decisions, how do you make decisions? >> Kevin Fu: You just need to be careful, [laughter] very careful, and this means you lose a lot of the abstractions that we take for granted. So to bring back a little bit of reproducibility and science to all of this we decided we needed to have a trace driven simulator. If we couldn't get people to reproduce the results, then what is the point? So this is based on a tool called MSPsim. It has already been out there, Ben and others augmented it to actually capture notions of energy. Literally what he did is he took a scalpel, cut off the front end of our little device. The energy harvesting front end would measure the voltage coming out of it, and then that voltage would be fed into our software, the simulated capacitor, so it is sort of first-order physics approximation of how capacitors behave under different voltages. And then we throw that into the simulator and this causes the simulator to have a notion of when to fail, went to reboot when there is not enough voltage. Again it is sort of physics 101. There are all sorts of weird nonlinearities when you are dealing with capacitors. But we did validate this. We actually measured the N 10 power consumption, energy consumption rather in order to verify whether our simulator was pretty good. And you can see the paper if you want to know more about that accuracy. You've got this program that our handy dandy Mementos edition has instrumented with energy checks. Now we would like to see how it actually performs in the simulator. So the simulator takes its input two things, the executable, the instrumented executable, and a voltage trace. This allows it to be reproducible because you can give the voltage trace to somebody else and they can go run it. The output is a little more complicated. The simulator is going to tell you in the ideal world here is how many reboots it is going to take to finish. Here's how many CPU cycles and total wall time and also an execution trace so you can go back and see why things fail. To give you an idea of what it looks like, here is a real trace that has been annotated a bit with illustration. The blue down there on the bottom represents the energy harvesting front end. When we harvest energy it is fairly low voltage and that we use some special circuitry; it is called a charge pump in order to get the higher voltage and we put that into that 10 µF capacitor. So you can see it takes about 3000 µs until we get up to about 3.4 volts. Suddenly our hardware turns on and starts to compute. And we are sliding down that voltage. As we compute, the voltage on the capacitor is going down until we fail at the red mark, where we do a checkpoint, right before failure. And then that little green dot up there represents a restoration. So you can see it takes many reboot cycles to get this CRC check done. The simulation is great, but what would be better would be if our simulator could tell us not only how our program performs but how good could it perform. So during the rebuttal process we decided we needed to implement what we called an Oracle. It essentially, with certain assumptions about the workload, it essentially finds the optimal way you could ever possibly hope to do checkpointing. It does not necessarily imply that you could reach this optimal. It doesn't even tell you the strategy to do, but it says here is the best thing any strategy could do. You can think of it as sort of a binary search on our threshold voltage. So we are assuming that the only knob we have is the threshold and with this knob we need to know where to turn it, and we need to pick that knob to be static because we don't yet have anything dynamic control. So the first thing we do is we run the program at a very low threshold, and of course we are going to die. And we keep bringing that threshold up until we don't die and then we think oh, let's see if we can inch it back a little bit more and so we do binary search on those little deltas until we can finally zero in on what we consider to be a pretty good threshold voltage where we don't expect to get any closer to death and still survive. So that is how our Oracle works. The Oracle then outputs what was this running time. So we can use that as sort of a benchmark of how good are we. How did we actually do this? At the high level, again, Mementos is splitting these computations into chunks. At compile time it tells you where the edges are of your plate tectonic computations, and then at runtime it tells you sort of which ones are actually going to cause interesting events. And we are using the CRC example that we have also done some elliptic curve, RSA, but CRC is sort of the simplest example to demonstrate how the thing works. Unfortunately there is not a lot of other code out there to compare against. Nobody else wanted to go through this pain. I can understand why. So what we do is we compare against our checkpoint Oracle. So at least we know how close we are to ideal. To give you an idea of just the scale, Mementos itself is only about 2 kB. And one kilobyte of our flash memory is reserved for the checkpoints themselves. We have basically banks of checkpoints and it is kind of tricky to swap between them, because we don't want to erase a checkpoint before we know our next checkpoint is actually set. It is kind of tricky to do when you don't have full control of your registers. This is answering your question about why don't you just constantly monitor. To do a monitoring on the voltage supply takes about .1 milliseconds. This is mostly because of the analog digital converter. This is a fairly significant cost. It translates into a large energy cost. It is pretty close to a flash write, so we don't want to do it all the time. But then the program in itself only takes about 4 milliseconds and the restoration two milliseconds. One other thing to point out. The question we often get is why not just use TinyOS and believe me we wanted to use TinyOS. TinyOS is this operating system, library rather, for sensor modes, and the problem we ran into as it took about 100 milliseconds to boot TinyOS, isn't that right? And that was after you optimized it, right? So if you optimize TinyOS you can get it to boot up in about 100 milliseconds, and we needed two. We much preferred two over 100, otherwise we would never get anything done. We would boot up and we would say goodbye. So let's take a look at this CRC test case, and we ran it through our Oracle using this energy trace, so we just randomly picked this guy. In the paper we have I think on the order of 10 different traces to exercise different characteristics. And the Oracle told us oh, it will take you about 4 seconds and the best you could ever hope for would be 14 reboots and, by the way, we recommend based on our Oracle pickup threshold of 2.35 Volts. Just pick that, again not dynamic, static and that's what you should get. So we ran this against the simulator with our actual checkpoint strategy, because we can't just pick any strategy, we have to pick something the compiler knows how to do, and we were able to finish with about 16 reboots in slightly more than 6 milliseconds and that was with the threshold of 2.6 volts. It's not as good as ideal, but it is getting pretty close and that is sort of where we are today. And I can just point out that this is, so sometimes you read a systems paper and you ask is this a paper about improving performance, or is it about better semantics? And this is definitely not about better performance. Here it is about semantics. We are trying to get programs to actually run. The status quo is programs don't run. So at least we get them to run, and the penalty is about 10X is pretty typical, but at least we can get them to complete, if there is a pathway to completion. So you have already seen the one with Mementos, if you don't have any kind of checkpointing strategy, you can see you got these Sisyphean tasks, you roll your computations up the hill and you keep dropping it and you will never finish. There is tons of related work in this space, in RFID scale devices; especially I will point you to a paper by Michael Buettner at University of Washington, Dewdrop at NSDI. He has a different approach to dealing with energy constraints. They essentially have a scheduler and the scheduler will look at your energy levels and decide which tasks to run based on how much energy is left. So if you are very low on energy, you might only do a quick task. In our work, we are thinking that we have a single task, so we don't really have a choice to do any scheduling. And again checkpointing has a rich history. I usually think of things like Condor from I think it was the early ‘80s or maybe it was the early ‘90s, but from Wisconsin on grid computing, because if you want to do migration of your computation you need to checkpoint it. So there are a lot of interesting things that you can borrow from the checkpointing literature of yore. As Jay mentioned, why not do dynamic? We think it would be really cool to have some kind of dynamic decision making. We just kind of chose to solve a simpler problem first. It would be a great extension and I would hope that we could do better, but it is not obvious to me if dynamic would actually make it better, because anytime you need to make decisions it is going to cost you CPU and it might actually make it worse if you miss the deadline, so it is not clear if it will work. We are looking into all kinds of other volatile memories, phase change, ferroelectric RAM some of these might make it a lot easier to do. And it is still is yet to be seen. We originally were doing some compressed checkpoints. But as you can imagine, if you don't have full access to your registers, it is really hard to write compression code. So we decided not to do that. Actually I think Ben said please, do I have to do this? And I said okay, we will wait until the next time. And we do have a platform now that we are selling. We are selling this guy for research purposes. There is my model again. Thank you, Jeremy [inaudible]. If you would like to play with that or have one, come talk to me later. Oh wow, check point of my first half of the talk. It is a good loop latch. Hopefully I am not going to repeat. Should I just start over [laughter]? So Mementos is all about energy aware checkpoints to help these batteryless devices get long-running things done. It won't necessarily get done quickly but it will at least get done. And it is using the LLVM compiler optimization passes to make this possible, trace driven simulators so that you don't have to do it on hardware and find out that it doesn't work anyway. You can do it on software first before you get into the nitty-gritty of circuits. And the applications, there are all kinds of fun little things. We used to think of crypto and implantable medical devices, but I am working with one evolutionary biologist who wants to use some of this for insect scale tracking, so we recently were talking to experts from Cornell who are micro-surgeons for crickets, to be able to implant these inside crickets for tracking. But I am not sure if that is going to work out. It would be interesting if it does. So that is a checkpoint, but I think there is a question before I… >>: I just want to clarify something you said, it sounds like as far as the [inaudible] goes there's a lot of work to compute the optimal perfect [inaudible] voltage and then add 10% [inaudible] >> Kevin Fu: Roughly. And there are other constraints such as you can't just buy a voltage regulator for any old voltage you want. There are certain manufacturing constraints, discrete choices available. So any other questions before moving forward? So the next thing I am going to do is talk about some work that Negin lead over the last couple of years and it all relates to getting flash memory to work well saving potentially of upwards 30 to 50% of the energy by basically violating abstractions. So we are going to violate abstractions with glee and try to make things still work. Big data, so here is the typical picture that you see of a data center and you think about massive amounts of storage and making those run using fewer joules of energy. The kinds of things we are going to care about though are if you take a look at our doctored photo, up there in the ceiling, you see a smoke detector. And that smoke detector has flash memory inside of it. Does anyone know why they smoke detector has memory? >>: Insurance companies? >> Kevin Fu: Yes. Why do you think the insurance company, actually, it is to use against insurance companies. Now let me explain. I recently went to a tear down of these devices and the manufacturers said that the reason that there is flash memory in their smoke detector is that, let's say your house burns down. And your insurance agent or your insurance company goes to the manufacturer saying well, obviously your smoke detector failed so we are going to sue you. Smoke detector company says let me see that smoke detector. Let me hit a few buttons, take a look. Oh, according to the flash memory, you never hit the test button. That means you obviously are an irresponsible homeowner and I think you are the one liable. So they actually use the flash memory against the insurance companies and against the homeowners for liability, so it is for legal reasons. Anyway, there is flash memory inside there. And there is flash memory in a bunch of other very low power typically battery-powered devices; I like to call them boring high volume devices. So thermostats, smoke detectors, implantable devices are a little more exciting, toll way transponders, kind of boring stuff. But there is a lot of it, so there is over 10 billion in sales of these kinds of things each year in the US alone. Now there is also flash memory in a thing called solid-state drives, and I know many people are excited about that. We are talking about things much deeper on a smaller scale. Some of these things I'm going to talk about might apply to solid-state drives in the future, but right now the price points are not very good to making this applicable to solid-state drives. So let's focus in on the smoke detector. If you crack it open you're going to find a microcontroller. This one happens to be from Microchip and it has 8 kB of embedded flash memory. And the interesting thing in the spec for the microcontroller is that it tells you that you can run that say 2.2 volts or you can run at 4.5 volts and this gets interesting from a power consumption perspective because power consumption is proportional v squared. So if you run the thing at a high-voltage it is not just linearally worse; it is quadratically worse. So let's think about how they manufacturer would use this in deciding which voltage to set. In the ideal world, you can imagine you have a single chip that contains your CPU and your flash memory, and it has two pins, the two different voltages. This allows you to scale your energy proportional with the workload. If you are not using your flash memory, you just don't bother using the 4.5 volts. If you are using the flash memory all the time okay, sure you are going to burn up a lot of energy, but you can dynamically scale it. Unfortunately in the real world, what happens is energy is proportional to the worst-case, no matter what you are doing. And the reason why is it is very expensive to put extra pins on a chip. So there is typically just one pin; you get to pick one voltage to rule them all and so if your CPU needs a low-voltage to run and your flash needs a really high-voltage, the manufacturer is going to pick the high-voltage whether or not they use the flash memory, they are going to set it to the high-voltage because they might need to use the flash memory. So we are going to try to make these chips behave more in the energy proportional world even though we are actually running in a world where we have to pick a single voltage to work for everybody. And we are going to do that by violating specs and abstractions but still bringing back reliability of our storage. So we are going to try to recover that wasted energy. Our approach is strictly software-based, so the cool thing is you can make some software abstractions and how you interconnect with the microcontroller and it has this strange interplay with the physics of flash memory such that you can then save power. It makes no sense why it should save power strictly from a software perspective, but if you think of a bigger system, we will show you soon about why we can save power. Okay. A little bit more about the state-of-the-art. I already told you sort of the way things are done today is the manufacturer will pick the highest voltage because it is just the easiest thing to do but then your device won't last as long. It will use up the battery more quickly. And the other approach, if you have a lot of money, you just at hardware, so you could do sort of a poor man's discrete voltage scaling, that is you could on the fly change your voltage. Unfortunately, this does require circuitry. And in high volume low profit margin devices they really don't want to spend $.10 on even a couple of transistors. So this is something that is generally avoided. And then I like to poke fun at Josh Smith. He was at Intel, now he is at University of Washington. Our original motivation for this was he gave us one of his WISP tags, this programmable RFID, and we were going to write to flash memory and he said oh, you can't write to flash. There is flash memory there but you can't write to it because we set the voltage too low. So they literally just disabled the flash memory, and we wanted to get it to work so we wrote to it at the voltage we had not the voltage we wanted. And this picture should start to look pretty familiar. Okay. So the basic idea here is your standard engineering trade-off. We are going to run the thing at a very low-voltage. We might get some errors. And we are going to need an error correction routine and we need to know well, is that error correction going to cost so much that it makes the savings not worthwhile, or not. Now the good news is, I will give away the ending, we can actually save more than we need to spend to do the correction on almost all workloads that we have encountered. So let's go back to a little more inspiration. Who used a punchcard [laughter]? Are you proud of it? So in 1982 Revestian Shamir had this cute paper they called Wits. They called them write once bits. So this was back in the day where write once memories were fairly common, woms, for instance. You could flip a bit once in a direction, but you could not flip the bit backwards. So was very much like a punchcard, but electronic. And they figured out a clever coding scheme where they can make the memory rewritable by actually having the next write be sort of a superset of the previous write and they would trade-off capacity with rewrite capability. So we took a look at that work and we thought well, what if we sort of write to these write once bits sort of halfway, so this is where our halfwits work comes in. And let me tell you about just one example of how you might use this approach to write to halfwits. So we are using NOR flash memory and in NOR flash memory, all bits are born as one. They are all equal. And let's say you want to write a value to that memory where the two least significant bits are zero, seems like a pretty simple operation, but if you are doing it at low-voltage, you might get some errors. The bit flip might not actually happen. So imagine that you do the write and then the least significant bit does not flip. That is in error and we will have to correct for that. That is the bad news. The good news is this is what you call a Z Channel. It has these asymmetric probabilities of failures, so if you are trying to write a zero, there is a probability P that it will fail, but if you are trying to write a one it is never going to fail, because they are born as ones. So we are going to be able to exploit some of these asymmetries to get better energy consumption. >>: Is this independent [inaudible] the same bits? >> Kevin Fu: No. This is where it gets interesting. So there are a whole bunch of different factors that you may want to consider. And so Negin did all of the experiments on this and all of the experimental setups. So we considered a bunch of different things. The ones at the bottom I am not going to talk about, because we have found no evidence that they actually influence the probability of error, but we will talk about the first three factors that influence errors before we get into our approach. We got one obligatory eye candy picture that you always need one of these in any talk involving electronics. We now have a digital power supply; this is kind of error-prone, right? If you accidentally bump it, the voltage changes. But Negin can tell you more about this set up. This is her beautiful set up here. And we are running at 1.813 volts at the moment and we are testing back platform with a second device to monitor how well it is performing and actually how much energy it is using. So let's get to the data on just how the voltage affects the error rate. It should be obvious that we are going to get errors. The question is what does that error distribution look like? So here is a chip that was designed to work at 2.8 volts, I believe. >>: [inaudible] >> Kevin Fu: 2.2 volts, sorry. And you can see as you are edging your way down, there is sort of this cliff where suddenly a whole bunch of errors pop up. And so this was in one microcontroller. There is a second microcontroller that we had that was the exact same model and it had a radically different cliff. So there the errors started around 1.92 volts. So variability again comes into the picture because of process variation. If you start looking at other microcontrollers, of different families of microcontrollers from the same manufacturer, you're going to get different curves, different times of failure. So all we know is that there will exist a cliff. We don't know exactly where that cliff is but we do know that there are pretty big safety margins built into the manufacturing, so anything above the cliff is wasted energy, basically. One other thing that I consider sort of just a confirmation of what we would expect is that Hamming weight is directly influencing our probability of error. So basically if you have a lot of zeros, you are going to have a higher probability of error, and it just makes sense because flipping from a one to a zero, there is a probability of error, not flipping or writing a one involves no change so there should be no error. >>: Are these [inaudible] bits or for the whole byte or whatever? >> Kevin Fu: For the whole byte, so this is the Hamming weight of how many zeros are in it. So if we have, excuse me, how many ones are in it? So if we have all zeros, all eight zeros in a byte, it looks like maybe 80 near 90% chance of some error in the byte. In the back? >>: Do you assume you are [inaudible] structure [inaudible] or something? Do you only write to the end of it? You never change… >> Kevin Fu: We have no file system. So this is raw access. There is no flash translation. >>: You're not going to go back and change [inaudible]? >> Kevin Fu: We are going to do something very much like that. This is just observational data, trying to figure out how the raw flash data behaves in strange voltage conditions or strange conditions. This happened to be at 1.9 volts, 1.8 volts. Another factor to worry about is wearout. I always thought that as you wear out your flash memory, I always thought that it just got crappier. But I guess that is not a scientific term. So what Negin has done here is we had three different areas of flash memory blocks. It is very small. These microcontrollers do not have a lot of memory. And we artificially wore out through several read, erase write cycles the midsection, as that is more often used. And then we wrote a bit pattern and we checked to see what kind of error rate we were getting. So what is interesting is so the lighter color indicates high errors and what is interesting to me is that the midsection actually has fewer errors, so this tells you that the more you wear out your flash memory, the less likely you are going to get an error. It was counterintuitive to me at first, but then I realized it makes perfect sense, because what does wearout really mean? It means that you are changing the bias from bias to one to bias to zero. So the more your flash memory is wearing out, the more likely, or the easier it is to get it to change to the zero state. It also means that it is harder to get back to the one state. So that is why operations get harder. But this is to me it was counter intuitive. >>: [inaudible] >> Kevin Fu: Sure. >>: What is the linear [inaudible] binomial because of the more [inaudible]? >> Kevin Fu: Megan, do you want to take a stab at that? >> Megan: In each of those you have six zeros or seven zeros so [inaudible]. >>: But it seems like, maybe I am misinterpreting error rate. Isn't it just a probability that at least one [inaudible] did this wrong? So if you have two bits, you could be wrong and then double and then shouldn't it be exponential, because the probability that they are all, the probability that at least one fails, well I guess that [inaudible] >> Kevin Fu: I will have to think about that. I am not sure. The other problem is you could have an exponential and we might just be in the portion that looks linear. But to me it just sort of confirms yeah, the more zeros, the more errors. >>: [inaudible] upside down approaching one hundred percent error rate [laughter]. >> Kevin Fu: All sorts of fun and tricks you can do with graphs. Well, there is one other property I want to talk about and that is sort of the most important one and that is I did not know at first that flash memory has an accumulative behavior. The way flash memory works is that it represents zeros and ones by how much charge is in the cell. And so if you attempt to do a write to flash memory and you fail to actually flip a bit, you have actually partially succeeded. You got a little bit of charge to go into that flash cell and so the next time you try to do a write, it is going to be a lot easier to do, because there is already a little bit of charge there. And that is the primary property we are going to exploit with our software abstractions to reduce the power consumption to let things run at a very low voltage, and to build a coping mechanisms for when we do accidentally run into an error. So I am going to skip a little bit forward to, I don't think we need to talk too much about the model to understand how this is going to work. But I will just talk about some of the design goals. We want the energy consumption, we want to minimize that. We also want to minimize error rate, and delay is also important, because if you don't have a predictable amount of time before your data can be written, it makes it a lot harder for the system developer. Of course, you might not be able to get all three at once, but let's talk about three different approaches and how they affect these different minimizations. So we tried three different approaches to deal with errors at low-voltage. The first one we called in place writes. This is where you can imagine you do a write to some location and you just have a feedback loop. You keep writing to it until it sticks. The second one multiple place writes is where writing a value to one location and then we write the same value to a second location. Now there might be an error, but because the errors are these Z Channel kinds of errors, we never get errors, if we ever see a zero, we know that it is a proper zero. If we ever see a one, we don't really know what it is. So we can do a logical and of all of these locations at read time to then figure out what was the value. Then the third approach was more traditional. It is using error correction codes, Reed Solomon Berger codes, so throwing computation at the problem rather than throwing just writes to memory at the problem. Yes? >>: Quick question. At this point you are assuming that [inaudible] consumes less energy than one write in high-voltage, right? >> Kevin Fu: It's all about the workload. >>: This is something that I should hope for, right, because of two writes in low-voltage consumes more power than [inaudible]. >> Kevin Fu: It's actually slightly more complicated than that. So what you are pointing out is what happens if our error recovery mechanism is really, costs a lot of energy all the time. That would be bad. So the good news is errors are pretty rare. So we don't usually need to use our error correction mechanisms, but when we do need to use the error correction mechanism, the question is well, is it a small reasonable cost or is it such a huge cost that it is going to make the whole process pointless from the beginning, and that depends on the workload. It is going to depend on whether you are write bound or something else bound. But the good news is most workloads do actually work out for this kind of scheme. >>: Isn't it also the case that sometimes you don't have a choice? You are working with low-voltage and [inaudible] >> Kevin Fu: Exactly. Thanks for bringing up that point. So there are two different regimes you can think about for low-power. There is low-power because you are trying to save power and you have chosen to set a constant low-voltage to make your device last longer, but the other regime is that you've got this energy harvesting device and you have no control over your voltage and you have to live with whatever you have got. And that's where we started from. We were given a low-voltage and we could either choose not to save anything. In fact we went so far as to actually store things remotely over the air because it was cheaper to do that than to bring the voltage up higher to write the flash memory, so some really weird things happen in low-voltage, some very strange design decisions. So now comes to the part where we have beer [laughter]. So we are going to represent our bits as these beer mugs using negative logic. So here is a beer mug initialized to zero and here is a beer mug initialized to zero. You can see it has lots of charge inside of it. And we are going to use this to illustrate how these three different schemes work. So for the in place writes, again, we are going to do these repeated writes, so we start with our memory initialized to one and we want to make it into a zero. So we are going to add charge, but at a low-voltage. So it is slowly dribbling out and our bit tender is putting some beer in there. And it might take one or two or maybe three attempts to actually get enough charge in there before it would be interpreted as a logical zero. So this is sort of bending the notion of analog and digital. We are kind of doing hybrid digital analog all at once in order to save power. Now this took three attempts. That is not very ideal. That means it would take three times as long to do a write. Question? >>: [inaudible] because it comes to [inaudible] it is a zero writing, does that mean that it would also be zero 10 seconds later when you read it again? >> Kevin Fu: That is a good question. So we have yet to find any errors of that nature where it has flipped to a zero and then flipped back on its own. Now that is not to say it cannot happen. I am pretty sure there is some way that could happen, but the only good news is that we have not encountered it. With that kind of scheme in mind the sort of feedback loop, keep writing, rewriting into you get the bit you want, you can imagine so we got on the X axis the number of potential rewrites and then the error rate after that number of rewrites. If this approach is to work, we would sort of expect a quick drop-off to a very low error rate after just a few writes. If it does not work, well, we would see no change. So the good news is that all sorts of different voltages, you very quickly see a reduction in your error rate. So already at 1.87 volts, if you do for in place writes you are going to eliminate pretty much all of the errors. So you can actually just have a static routine with no feedback that just does four writes and you would virtually be certain to have no errors. If you are willing to have slightly higher voltages, which means you would not get the same energy savings, you can get much higher reliability, so here after just one write, at 1.9 volts we didn't get any errors detected in these many trials. So there you can kind of think of this feedback loop as just an insurance policy that we never needed to execute. That is the ideal case. We get really, really close to that wall where error starts to happen, but we never actually hit the wall. So that is our best case scenario. The other approach is we call multiple place writes. So this is in collaboration with Andrew Jang who has been here before to speak on rank modulation and multilevel flash cells. So here we are going to show how to do multiple places. So imagine you have two different bit locations, and in order to do writes you have to write to two locations. So here you can see that we have written a one there and a zero at the bottom and because we did not actually get enough charge to put into the Texas mug, so I will make fun of Texas [laughter] and then all you have to do at read time is due a logical and. Because of this asymmetrical error you do the and, and you will get the actual value. So as long as one of the bits was written correctly, you will read the correct value. We thought that this would work a lot better. But it was nowhere as near as competitive as the in place writes and my intuition on why this is the case is this particular mechanism does not exploit the accumulative behavior of flash memory and so the accumulative behavior really helps out the in place writes but it does not help multiple place writes, unless you combine them. >>: It also seems like the flash memory is more homogenous [inaudible] it surprised me >> Kevin Fu: Yes. I would've expected to see much more variation, but none was detected, nothing significant. I am not even going to talk about the error correction code read Sullenberger codes because on the order of magnitude they were two worse. It just turns out that computation, there is so much computation involved, that basically you had to pay an upfront fee to do the error correction before the first write and you might not never need that error correction, and therefore you are always paying this penalty whether or not you had an error, so we found that this feedback loop was much more effective because we only paid for an error when there was an error. I should move on since I am running out of time. We'll just do a couple of comparisons. So those were, well just to explain the difference between in place and multiple place writes in terms of time and energy consumption, they were pretty close and because they are so close it is almost obvious that you want to use in place writes because it uses less space. So if you were to use multiple place writes you would have a minimum of a 2X increase in how much flash memory you would need to use to get the same job done. So at the moment we see no reason to use anything other than in place writes. Moving on. So micro-benchmarks. Let's compare doing these in place writes at a lowvoltage versus just the standard approach if we were running at a high voltage. So there are no real benchmarks for this yet, so we had to come up with our own. So we are going to think about things that are sort of read intents and things that are write intents and combine them. So we've got four different environments that we are going to run in. The bluish colors represent our experiment, our in place writes at different voltages, and the yellow orangeish ones represent the standard worlds, different microcontrollers have different thresholds for writing to flash. So the first thing we are doing is the RC five block cipher which obviously doesn't have a lot of storage; it is mostly computation. So you can see that if you were to pay for the ability to have the potential to write to flash, you are proportional with the worst-case scenario of energy consumption. This is normalized energy consumption. And if you were to make it sort of more opportunistic using halfwits you can save 30 to 50% just on your energy consumption. The difference between the darker and lighter color is just how much reliability. The darker one, there is a potential for getting an error, small potential. And then the lighter one we have yet to find any errors, so it depends if you would accept having any kind of write errors. The retrieve workload is very similar because there are no similar writes going on there. Again we make it proportional to the workload. The bad news is storing is sort of the worst thing that you can do. So I think that this was related to the question about what if it takes more energy to store them. Yes. It takes more energy to store using this technique. But I claim, so what. So if you had a right bound workload, every single cycle was a write to flash memory, yeah, you would see about 50%, you know, it would be twice as bad as using the original scheme. The good news is that most workloads are not of that nature. Your smoke detector is doing other things than writing to flash memory. And, in fact, one manufacturer we are working with gave us a device where they do a one byte, I believe it is a one byte write flash about every 5 seconds so they were talking a huge ratio in terms of how much non-writing they do, but they pay for writing the whole time. So the hypothesis is that this low-voltage writes and the halfwits regime would be much better than the high-voltage system for typical workloads. I will just give you one workload and that is of a typical sensor networking monitoring application. So here we are going to be read 256 bytes of accelerometer data and we are going to compute some rather simple statistics and then store that aggregated information to flash memory. And keeping it pretty high level, we were able to get about a 34% energy savings and again it is because we exploited this accumulative property of the flash memory and we were able to drive their voltage much lower, get pretty close to the actual safety margin rather than some artificial spec that sets a very high safety margin for writing to flash memory. Here are some raw numbers just to let you think about how things are actually working. If you were working in the standard model at 2.2 volts to do that workload I just talked about, we measured 410 micro-jewels. This was measured by looking at the drop on a capacitor. But if you were to use our technique for in place writes at 1.9 volts we have yet to encounter an error in that regime and the energy consumption was only 300 microjewels, so a pretty significant savings with no penalty for errors. Now let's say if you take the approach of errors are okay. There are many applications that are okay with errors. Think about like UDP for storage. If you don't mind dropping down to 1.8 volts you might get an error after two writes, and if that is okay, every now and then you get an error, you can still have additional savings down to 270 micro-jewels. But for this reason if you do care about having perfect data, we found at least for this microcontroller 1.9 volts is sort of a sweet spot to get things to work well enough before you hit the curve. Okay. There are all types of other little improvements you can make. One is kind of fun. If you have data that is always filled with a bunch of zeros, well maybe you should flip all of your bits. So if you have a sign bit, just store the complement and it will actually take less energy. And it will take less energy because you are less likely to have errors. This comes up in EKG data, for instance. If you are storing medical data, you are going to see a lot of errors. And you can also do compression, things like memory mapping tables to basically have codes to reduce how many zeros you need to write. So to summarize this approach for flash memory low-voltage, it is this regime where we are trying to make our energy consumption proportional to the workload rather than the worst-case. Microcontrollers today are basically being abused for cost and manufacturing reasons. They run at a much higher voltage than necessary. Our abstractions are forcing us to do this, so we basically violate the digital attraction in order to save energy and that we are able to actually bring back some of that reliability through these error correction processes. And we are actually considering some commercialization of this. I will just say that our commercial version of this we call Smash Memory, it is smarter flash, sounds a lot better than flarter [laughter]. But you can start to think about if you use this halfwits regime you can actually reduce your manufacturing cost, because you can now live with components that are higher variable. Oh, you had a resistor that was 5% versus 10% variable, well now you can use the ones that are crappier and still get away with it, smaller batteries and you can make the device smaller, weigh less and also you can make it greener, because there are fewer chemicals going into the environment as you are scattering these things around. And I think I will just leave with there are a lot of devices out there that could benefit from this, so we just, I had Negin working with a number of undergraduates and we called it basically looking for 100 microcontroller products in 100 days and we did all sorts of power measurements to figure out what the microcontrollers were doing and what they could do, and there is a lot of waste out there. I think I will just say. Okay. So that is halfwits, oh was that a question? >>: Some of the things that there are funny. >>: You talk about a lot about the writes didn't touch too much on the reads [inaudible]. >> Kevin Fu: Yeah, at the moment, you know, it is just a Turing machine with an infinite tape. So I am assuming we don't need to do the erases right now or occasionally because we are working with energy harvesting devices mainly, we assume that occasionally we will have these happy power seasons where we do have a lot of energy and we can do sort of an erase of everything that is not being used. That may not be the case in all applications. I don't know; Negin, do you have any comments on erase? >> Negin: [inaudible] we never saw any errors because of erase [inaudible] we don't get any errors [inaudible] >>: Even as the memory wears out and gets harder to… >> Negin: At that point yes, but during the [inaudible] >>: And does that require a higher voltage? >> Negin: No. Just the same. >> Kevin Fu: I think [inaudible] in the back first? >>: It seems to be more the frequency scaling if you run instead of [inaudible] you clock the frequency a lot higher, you're going to finish your job much faster and save energy that way. Any comments or are… >> Kevin Fu: I have a couple of comments about frequency, choosing the appropriate frequency. One is no comment, and the second one is the workload you see on these kinds of devices are not your typical workloads that you would have on a desktop or on a node in a data center. They tend to be doing a lot of sampling, for instance. And you could speed up your clock for instance to get more things done whether you need that granularity or not, is not clear. >>: [inaudible] compute card [inaudible] >> Kevin Fu: Typically the bottleneck is storage, how much you can store, because the storing to flash memory is an order of magnitude worse then CPU, usually. Do you have anything to add? >> Negin: Yes, that's right and on another approach we call it the slow writes, and we just slow down the rights to flash memory and the flash memory in the errors drop. It is much better than just [inaudible]. >> Kevin Fu: So there are different ways to get the similar effects. >>: You mentioned flash and fire alarms to settle insurance companies’ lawsuits. What do the fire alarm lawyers say about [inaudible] conflicts [laughter]? >> Kevin Fu: I think it is the motivation. I think it will be a struggle to actually have that work in a smoke detector in practice, because they have to pass UL inspection and anything that involves high criticality has huge safety margins, so for instance although the battery is rated for X years they only actually advertise it as X over two years in order to get their safety rating. So there could be some uphill battles on that, but I think that this is the kind of case where there are certain things in place that don't necessarily, certain policies in place that might not make sense, and there are trade-offs but we can quantify those trade-offs. >>: For a fixed reliability, suppose you were given a reliability target of say 90%, or 95 whatever, and you could set the voltage as you like, what would be the ideal? Because my intuition is that you would just set it just above that cliff and then not do the multiple writes. >> Kevin Fu: Well, there are a couple of complicating factors. So first of all static or dynamic, I will just fill that in since you asked the question earlier. I think you could do a lot more with dynamic. So if you had errors, you could certainly, if you have control over your voltage , you could choose a higher voltage. But if you don't have control over your voltage, all bets are off. You don't have a choice anyway. Let's see how to address that. I think it is going to be tricky to address because as you saw in the earlier slides even the same chip from the same family has a very different cliff, so you could pick a voltage, but it might be different for the other chip. So what I would recommend is that you profile 50 or 100 chips and then just take the maximum value, and say well, that seems to be according to process variation that is about as high as we would ever get. So set the threshold there and that would probably be good static. You wouldn't want anything more conservative than that. There is no need to be more conservative unless the chip itself changes and remains the same model number. So if you are a company making a billion widgets and you always use the same chip you could do this, unless you are manufacturer of the microcontroller changes its behavior without telling you. Then you would have to adjust. Okay. I will just wrap up with a couple of other weird things we are doing in my lab. So one thing we are doing in trying to get these embedded devices to work well, is we are handing out implantable medical devices, not for human use. But if you are interested in using medical devices for your research, come and talk to us. We are loaning these out. They are from patients who did not need them anymore [laughter]. So for your homework besides that you can start to think about all interesting problems in this space of very, very low power computing. There are interesting hardware issues like what can you actually move off of the chip to save energy. There are some interesting circuit too problems or just memory choices. There are also all sorts of interesting options available. Most of us use flash memory today, but phase change memory and ferroelectric RAM and all of these other technologies are coming along with different kinds of error properties. Some we believe that may work well with this halfwits kind of model. And then on the OS and the PL side, there are lots of fun things to do like with task scheduling, compile time optimizations; I think it is sort of a fertile ground where not a lot yet has been tried. And there are probably some really cool ideas from the ‘60s, ‘70s and ‘80s, maybe even things that were used on punch cards that don't make sense anymore, but might make sense now because we are going so low power. If you want to learn more and if you don't want to keep repeating your computation, or our computations, you can go to our website, our Roman Empire SPQR.CS.UMass.EDU and that is all I have for today so I would be happy to take any questions now or off-line, but thank you for coming. [applause]

>> Karin Strauss: Good afternoon. My name is... introduce Professor Kevin Fu. Kevin almost needs no introduction. ...

Related documents

Products

Support

&gt;&gt; Karin Strauss: Good afternoon. My name is... introduce Professor Kevin Fu. Kevin almost needs no introduction. ...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Karin Strauss: Good afternoon. My name is... introduce Professor Kevin Fu. Kevin almost needs no introduction. ...