1

advertisement
1
>> Ming Zhang: Hello, everybody. It's my pleasure to introduce Abhinav
Pathak. He started as an intern in MSR in 2009, and during that time, he work
on some problem about how to split the computation job from mobile system to
the cloud. And that motivated his follow-up work to use some models to predict
energy usage on smartphones and then he went on to work on a bunch of stuff
about how to debug energy consumption on smartphone, how to profile energy
consumption on smartphones and that generated a lot of press coverage recently
in BBC news, MIT technology reviews, et cetera. And he will cover these topics
during today's talk.
>> Abhinav Pathak: Thank you, Ming, for the introduction. Hello, everyone,
I'm Abhinav Pathak. I come from Purdue University, and those who don't know
where Purdue is, it's -- look at it right in the middle of nowhere. Surrounded
by miles and miles of corn fields, flat lands everywhere you see.
So today, I'll be talking about some of the work that goes into my Ph.D.
thesis, and which is energy debugging in smartphones. Very old news,
smartphones are selling more than PCs. We are seeing this from last quarter of
2010, 2011, 2012, the entire country. It's increasing. But what we are seeing
is people really use their phones. They depend on their phones, and it's
predicted that a number of people who will access internet from mobile device,
from smartphones, will exceed the number of people who access internet from
PCs.
This talk is about one of the most critical problem in smartphones, which is of
energy consumption. Why is this problem there? Energy is very, very
[indiscernible] smartphones. Smartphones comes with a limited battery life,
and if you look at the last 15 years of research in battery energy density,
battery energy density has only doubled. Whereas you see modern day
smartphones, they're getting faster CPUs, multiple cores, 3G, 4G. Retina
displays, GPS, cameras, multiple cameras and basically there's a problem
between producer of energy and the consumer of energies.
Since energy is getting a very, very critical problem, what we have started
observing in smartphones is a new kind of bugs. Which we call is energy bugs.
What is this phenomenon? Whenever an energy buck strikes you, you get a single
symptom. From that point onwards, your phone starts draining battery at a
very, very high rate. In case of the most notorious energy bugs.
2
You don't see an application crash. You don't see a blue screen of death, like
OS crash or something. Everything works fine. Your Facebook runs, your G-mail
runs, your client, everything, all the applications are running. But then you
just observe that the phone is draining battery at a very high rate.
And normally, this is very frustrating. People think the common perception is
some app has gone rogue. So let's try to kill open task manager, kill some
applications. In some cases, it helps. In some cases, it doesn't. In most of
the cases, it only makes the problem worse.
Irrespective of that fact, it is very, very frustrating. And we start down
this line, what is this new thing what we are observing? What are these energy
bugs? Where do they come from? What are we dealing with here? We never
observed this in PCs and laptops.
So we did a lot of data collection just to understand what are these things.
We went to mobile internet forums. These are plays where people post what
problems they're facing in their phones. We went to four really popular mobile
forums. We collected about a million posts of these users. We did some data
mining on these posts. Got out posts related to energy drain or severe energy
drain. We did some clustering on this. We had 39,000 posts related to energy.
We did some clustering, and once we got thousand clusters, I sat down and read
all of them, physically trying to built a taxonomy of why is this problem
happening.
We also went to mobile code repositories, like Android, Memu, open source
repositories, where we found out we were looking for the batches to fix energy
problems so we'd get some insight, why are these problems coming in
applications. And with all this information, and some information we obtained
from the tool we built, which I'm going to talk today, the energy profiler, we
built an energy bug taxonomy to see where the problem comes from.
Let's see. Where does this energy bug come from? Well, that's one of the most
obvious places to look for, something wrong happens in application. And now
your phone starts draining energy at a very, very high rate. If you use
Android phone, you make a phone call, depending on the version and the hard way
irrespective of how long you talk, after you hang up the phone, put the phone
in your pocket, your phone is going to drain battery. It actually may
[indiscernible]. Even though you're not using the phone, it's just in your
pocket.
3
The problem can come from framework, like Android, Memu. Something is wrong
there. Something is wrong programmed or things are not working the way you
want it to. Problem can come from OS, and this is very popular in
[indiscernible] and new version of iPhone where IOS comes out, a lot of people
scream that new version of iPhone has a lot of energy problems. For the first
time after iPhone 4s comes out, apple acknowledges that IO has had a problem,
had an energy bug. They fixed it. Even that version had a problem. They
fixed another one. Even that had a problem. Finally, they were able to make
the problem go.
We found that the problem can come from device drivers. Something is wrong in
device drivers. Things are not working the way you expected it to be. Or from
the hardware. People are complaining that even your hardware could be a source
of problem. Something is wrong with the battery. Something is wrong with the
SIM card. Something is wrong with the [indiscernible] card. And the problem
is these kind of energy bugs come from network operators. Something is wrong
in a network and the applications now start draining a huge amount of energy.
So out of he's huge number of posts, 39,000 posts, we did a classification of
which categories these posts fall into just to give some rough numbers.
Hardware was 23%. Software was 25%, external was 12%, 30% of people didn't
realize where the problem is coming from. In hardware, the problem can come
from external hardware internal hardware. You have a dock, let's say, and now
there's the [indiscernible]. So you place a phone on it, your phone drains
battery. Or the problem comes from your charger. Your charger is faulty.
It's not just charging your phone properly. It will come from internal
hardware, which could be your battery has gone old, or the interface between
the battery and the operating system is not correct so it's not able to read
how much battery is left.
Your SD card has a problem. Some corrupt sectors in SD card and now your
application are going in a spin. Your SIM card has a problem. It's probably
an old generations SIM card or something wrong happened with the SIM card. Any
application that accesses contacts from your SIM card is going to drain a huge
amount of battery. It just hangs there and tries again and again.
The problem could come from external place. For example, an external service
crashed, a mail server crashed and now your mobile is trying every five minutes
to authenticate to the server. Goes on for hours and hours and hours and now
4
you're draining battery without doing anything good.
Problem comes network signal strength. You're not in a good signal strength
area and the recent numbers that we have is if you move to a bad signal
strength area, the energy consumed to do a certain work can increase by ten
times.
Or it could come from wireless hand overs. You're moving on the road. You're
shuffling between 3G to 2G or edge. While the phone is in your pocket, nothing
is going on. And now the battery is gone completely. Yeah?
>>:
What are you calling signal strength?
>> Abhinav Pathak: It follows the symptoms. Anything that follows the
symptoms that there is an unexpected energy drain, we term it an energy bug.
You can call it an optimization problem, or you can ->>:
I guess using the phone is a bug.
>> Abhinav Pathak: Unexpected -- you use the phone, it is still slightly
expected that the energy will go down.
>>:
[indiscernible].
>> Abhinav Pathak: Okay. Most of the problems are, let's say, optimization
problems. They're trying to optimize energy. But some of the cases, we are
seeing you're not even using the phone. It's in your pocket.
Application is one of the main source of energy bugs and we found that there
are three kinds of bugs coming out from application, which we call no sleep
bug. I'll deal with this in detail in this talk. Energy loopers, something
bad happens and now your phone is looping in a loop for no good reason,
draining the lo of energy. Or energy immortals, something bad happened in your
application, it reached a state where it is draining a huge amount of energy.
Irrespective of whether you kill it and restart it, it will still spawn from
the same buggy state, even if you reboot the phone and restart the application,
it will spawn from the same buggy state, basically meaning if the app enters
into that state, you cannot ever use it again.
5
>>: So [indiscernible] just an Android issue or have you validated this on
Windows phone or iPhone?
>> Abhinav Pathak: So again, the study comes from which post, let's say, and
which phones are popular you'll expect more posts in that side of things.
Here, we are looking more at Android, iPhone. A little bit of memu side of
things. But then things are pretty common. It's across different phones.
>>:
Even for the iPhone?
>> Abhinav Pathak: Even for the iPhone, yeah. Mostly bugs are classified into
three categories, and I'll talk about it in detail in this talk. This talk is
focusing on application energy bugs.
So you have an application, it's very popular. A lot of people are using it.
But suddenly, now people have started complaining that your app has a huge
amount of battery drain, and it's really hurting your business. You want to
debug this problem.
Now, what's the first question you want to ask when you want to start debugging
this energy problem? The first question is where is energy spent inside my
application? Only if I have this information I can maybe start looking and how
do I fix it.
And when we started this research in 2009, 2010, we found out that there's
hardly any tool available which answers this kind of question. And this thesis
builds an energy profiler which answers this question, where is energy spent
inside your application. Which process, which trade, which routine in your
application is taking how much energy.
Why is it taking that much energy?
that.
What can you do to fix it?
Things like
This is built on a fine grained model which is capable of predicting energy
consumption of the phone at a very, very accurately. However, this is still
semi-automatic approaches to debug energies. You have a profiler. You need a
developer. Run it, get the data, fix the bugs again and again, do it in a
loop.
We can still build things in an automated manner, targeting each one of these
6
application energy bugs in isolation. For example, with he have bugs, mostly
bugs in different categories, loopers and energy models. We can get a lot of
ideas from different areas of computer science and we'll show one, in this
talk, one particular example where we use a compiler's approach to solve mostly
bugs, one of the most notorious energy bugs in application automatically.
Without the use of a developer.
So let's see in the first part. Where is energy spent inside my application,
and I want to answer this question, it's very simple. I need to do three
things. First thing is I need to track power activities on the phone. Second
thing, I need to track application activities on the phone. And the third
thing is I need to match these two things, which application is taking how much
energy. Sounds pretty simple. Let's see.
How do I track power activities on the phone? Simple. Layman's solution. Use
a power meter. A very expensive instrument. Perform some surgery on the phone
and now this equipment can get you very accurate energy readings of the phone.
But the problem is this equipment only gives the entire phone energy
consumption. It doesn't give you which application, which [indiscernible]
application, which [indiscernible] and so on, these kinds of information. So
second thing is we need to use a power model. Build a software power model.
When we started looking back in 2009, 2010, what are the power models available
in servers, desktops, PCs, laptops, mobile phones at that time.
We found a lot of power models that use something called -- they are something
called a utilization based models. They say we have two phases. One training
phase, one prediction phase. In training phase, we'll actually measure the
power consumption. We'll measure the triggers around which we want to build
the model. We'll do some mathematics, we'll get a model out.
In prediction phase, we'll use those triggers again, we'll use the model and
originate the output of the power consumption.
All of the resurgent servers, laptops, PCs, and smartphones, most of them fell
into a category of what we call as utilization based models and they say your
hardware is only using energy only when you're actively utilizing it. They
come up with a very, very simple equation, model energy is equal to and last,
once again, how much packets I have sent over network. How much packets I have
received over network. Multiply by some constant. And last, once again, how
7
much CPU I have utilized. Multiply by some constant. Similarly, for this,
similarly for ever other component, write a simple linear regression equation
and you get the power.
When we pick this model up and we applied it to smartphones, we found that
these models failed terribly. Why does this model fail? The first yet very,
very intuitive assumption that these model makes is only active utilization
implies energy consumption. It's not true on phones.
Your phone can be draining energy, the hardware component of the phone can be
draining energy even though nobody is actively using that particular hardware.
I'll show you examples of this.
The second assumption here is an implicit assumption that energy consumption is
linear to the amount of work. And that is why you can multiply how much
utilization of network into some constant. Basically, they're saying if I send
ten packets, I can consume X amount of energy. Sending 20 packets implies 2X
amount of energy. That's not true as well.
Sending 20 packets can consume 3X, 5X, or in a [indiscernible] case, half X
amount of energy. Have a question?
>>: [indiscernible] radios, this model certainly doesn't exist. There are
models, for example, listening power, transit power, [indiscernible]. So it's
a much, it's certainly not this model that you're not transmitting or using,
it's not using energy.
>> Abhinav Pathak: Um-hmm. So we observe that a few people have started at
that time looking at individual hardware component and saying, you know, power
model is very, very complicated here. The simple things, the slight things
don't work. For example, in network, especially 3G, people started seeing
these kind of results and they're saying, we need this, and things like that.
I'll cover in the next slide. But we observed that this is true not only for
radio, not only for [indiscernible], most of the components in smartphones.
I'll come to the examples here.
One of the bottom property of a power model that we started from is we need to
get the power model to sell which process, which thread, which function is
consuming how much energy, but it's very hard to obtain these counters that
goes into this particular equation at these levels. So it's very hard to get
8
correct energy at these low levels.
>>: [indiscernible] that energy consumed is a linear combination of its
components.
>> Abhinav Pathak:
>>:
I'm skipping that.
It seems that you're not --
>> Abhinav Pathak: That's true as well. I'm skipping from that slide. That
is true. You can't add energy individually. Some components are tracked in
terms of energy consumption. I'm skipping from this talk. But we can talk
about it.
First example. Only active utilization implies power consumption is wrong.
We took an HTC touch phone running Windows mobile 6.5. We run a simple
benchmark. We connected it to power monitor. X axis plots the timeline in
seconds. Y axis plots the current consumed by the entire phone in milliamps,
which is the same as power, multiplied by 3.7 times the voltage of the battery.
Nothing else is running on the phone. We run a simple benchmark which opens a
file, reads from the file. Opens a file, sleep for some time, read from the
file.
What we observed, the first thing is the moment you do a file open, there's a
power state transfer. A trigger. Fail open takes a few milliseconds to get
completed, but then it consumes a huge amount of power. When you do the file
read, again you see the same things.
One thing is, in the traditional power models, file open is not considered an
active utilization of hardware. It's only read and write calls. But we
observe that calls like file open, file remove, file close, file create, all of
them are capable of doing a power state change.
Second example, took an HTC Tytn 2 phone, Windows mobile 6.5, sent some packets
over the network. Even after you're done sending, this is on WiFi, you see a
tail for two seconds. No packets are being transferred in that particular
time. Still, the hardware is consuming some power there.
>>:
[inaudible] especially the first one, the SD card [indiscernible].
9
>> Abhinav Pathak:
it's a feature.
>>:
It comes from the driver.
We don't know if it is a bug or
It's part of the OS?
>> Abhinav Pathak: Right, it's part of the device as we can see, and device
drivers try to manipulate the power of the hardware component. And that's
where we try to guess later on. In some cases, it is needed. In network case,
for example, it is needed because you're expecting more communication and
that's why you are staying in a high power state. But irrespective of that
fact, your utilization based models cannot capture these kinds of things.
And tail phenomena is very common in 3G. We have observed it in 3G and SD card
and WiFi, in GPS, in [indiscernible] hardware, in [indiscernible] OSS. This
phenomena is fairly common.
Second, energy scales linearly with amount of work. [indiscernible]
experiment, took an HTC Tytn 2 phone, Windows mobile 6.5, sent packets at the
rate of less than 50 packets per second, second experiment, send packets at the
rate of more than 50 packets per second. This is how the power profile looks
like. X axis is timeline. Y axis is current.
When sending packets at a rate less than 50 packets per second, you see 100 to
105 milliamps per spike. Increase the rate, the power consumed is triple. If
you actually go ahead and consume the energy in the graph, the area under the
graph, the energy is not linear right there.
What have we learned so far? So far, we have learned that there is a notion of
power states. You do something. Something triggers and you go to a high
power. You do something, it comes back to low power.
What we assume is device drivers is doing these kinds of low-level power
manipulation behavior.
>>: I'm a little confused by your previous slide, because you doubled the
rate. So you somehow changed, you said if I send -- when you talked about the
slide originally, you said if I spent [indiscernible]. If I send 20 packets, I
could ->> Abhinav Pathak:
Right.
10
>>: Here what you've done is you doubled the rate.
amount of packets.
>> Abhinav Pathak: I didn't double the rate.
and 51 packets per second.
>>:
You didn't double the
You choose 49 packets per second
Not double, but you're lengthening the rate.
>> Abhinav Pathak: The rate of packets sent, yeah. So there is something
[indiscernible] that there's a load base characterization here, power
characterization. And that's what we see here, that the device drivers ->>:
How do you know that 50 is --
>> Abhinav Pathak: Huge amount of experimentation. In fact, we tried to
capture this in the [indiscernible]. The thing is device drivers are doing
something intelligent down there, and the idea here is we want to reverse
engineering what is the device drivers doing there. What are the device
drivers doing there. The problem is device drivers are so [indiscernible],
they don't [indiscernible] HTC, Samsung, Apple, and you need to do basically a
black box reverse engineering of the power states inside the device drivers.
Okay. [indiscernible] power models don't work. So we went back to the scratch
board. We said, okay, who consumes power in smartphone? Very simple, hardware
consumes the power. There is nobody else draining battery in the smartphone.
Who drivers the hardware? Application drives the hardware. How does
application drive the hardware? There's a very, very nice interface called
system calls, through which most of our applications are able to access
hardware. And the idea is if we can capture the system calls effectively, and
we try to build up our model around system calls, maybe we can do a better job
in terms of accuracy.
And the advantages are very simple. We capture everything that utilization
based models see. Just look at the parameters of the system call. We capture
power behavior of those system calls which need not imply active utilization,
like file open, file creation, file delete, file close. And the most beautiful
part of this approach is system calls has a very, very nice property. They can
be traced back to where they are coming from, which process, which thread,
which looting is doing the system call.
11
So let's see. The challenge here is we're trying to learn the device driver
power model -- power manipulation and we are looking only at two things, the
time out and workload based.
So how do we reverse engineer this? The first thing is we use a finite state
machine representation. We move away from linear [indiscernible] equations.
There are nodes, in the power model. The nodes could be a base state, where
nothing is happening in the device. A productive state, where device is
actually doing some work. And a tail state, where it's waiting, maybe, for
further communication or waiting for further activity.
The edges from one nodes to other nodes are basically transition rules. They
could be system call driven, start of a system call, stop of a system call. Or
they could be device driver intelligence and we are looking at two of them.
Timeout and workload based, like 50 packets per second.
>>:
So the base state tends to be a different [indiscernible].
>> Abhinav Pathak:
>>:
So is that itself a state and how the transition happens?
>> Abhinav Pathak:
>>:
Um-hmm.
So base state for CPU would be nothing is going on.
But --
So the production state would be the --
>> Abhinav Pathak: There could be multiple production states, there could be
multiple tail states for the hardware. But CPU we are handling it slightly
differently. We are noting down the frequency, retaining the value of ->>:
[indiscernible] you will have different nodes.
>> Abhinav Pathak: Right. That is there in the model, based on signal
strength, based on -- but based on the rate you're sending, you need to go to a
different power state. I'll show that in the slide.
The approach is very simple. It's a black box reverse engineering. We don't
know what is the device driver doing. We don't have the source code. We call
it a systematic brute force approach. We're going to try everything possible.
12
What we'll do, all the system calls are going to a particular hardware.
try to build a finite state machine for each of these system calls.
We'll
And once we have the finite machines for all the system calls to a hardware,
we'll try to merge the finite state machines of different system calls going to
the same hardware. This requires domain knowledge. You need to know how the
SD card works. You need to know how GPS works, what are the [indiscernible],
and so on.
>>: So when all these other [indiscernible].
single system call.
>> Abhinav Pathak:
behavior.
>>:
Not really.
Otherwise, you can't model a
We'll try to combine things as well.
The
[inaudible].
>> Abhinav Pathak: Because from every stage, we try to do, we'll try to
generate different, again, brute force approach. We're trying everything
possible. I'll come to that in a slide.
>>:
[indiscernible].
>> Abhinav Pathak: Yes. So let's say how do we deal with first single system
call? We have read system call which has a file descriptor, a buffer and a
size. We run this particular simple application which reads something from a
disk and we get this kind of file read call starts, file read call ends. The
first step we do is we digitize it. We say you are in base state when nothing
is happening on the SD card. You did something, you go to a high power state
called d2. You stay there for as long as you're doing the work described by
the parameter of the system call. You come back to a D2 state, a tail state,
and then you go back to base state and basically just convert this into a
finite state machine representation, which is exactly the same thing.
You are in base state, B. File read triggers you there. You stay as long as
you're doing the work, you come back to this tail, you stay there for a certain
timeout, and you go to base state. To answer your question, now, here we'll
try to do read call from different states. Just the read call. We'll try to
do what happens if you do another read call when you're in high disk state.
What happens when you do another recall when you're in tail state? So you're
13
are getting one finite state machine for one system call.
>>: It seems like there's still dependency, though, like what else is running
on the phone. Just wondering how complete is this approach in practice?
>> Abhinav Pathak: So when you're doing this experiment, nothing else, we make
sure minimal things are running so that the phone always is working.
>>:
You just have a base phone and the only app that's running --
>> Abhinav Pathak: Only app is running. But there could be interferences
here. For example, you see some things going on on top here from different
components. But you need to average things out around these. We are
controlling the phone entirely when we are building this model so it's in our
hands.
>>: So how do you know the firmware is not doing something?
controlling the phone.
>> Abhinav Pathak:
You said you're
Right.
>>: The firmware might actually be doing something to the hardware under, you
have no control, you have no visibility.
>> Abhinav Pathak: Right. So we assume the firmware and the device drivers do
two things. They manage power based on timeout or they manage power based on
[indiscernible]. These are the two things where power state change, they
manipulate. However, a firmware could be doing something like if it is raining
outside, I change the power. I'm not looking for it. It's a black box reverse
engineering so I want capture there.
A complicated [indiscernible] management, I'm not looking for that.
capture it. It's a black box reverse engineering.
I won't
>>: [indiscernible] something like fraction of reads and writes
[indiscernible].
>> Abhinav Pathak:
>>:
Sure.
So that's what I was asking.
[indiscernible] or just focus on multiple
14
latency [indiscernible] or do you look at all the possible [indiscernible].
>> Abhinav Pathak: Okay. So one thing you're referring to is what happens
when caching is there and things like that. We'll handle that separately. But
low-level things, like device driver could do read, do it later, do some other
things, we're not getting any information out of it. We're not capturing it.
We don't observe it in phones. If we observe it in phones, this won't work.
>>: You seem to ignore the power [indiscernible] of the phone saying I'm here,
I'm here, I'm here. Is that [indiscernible] inevitable?
>> Abhinav Pathak:
>>:
What do you mean by --
A cell phone keeps telling the tower where it is.
>> Abhinav Pathak:
>>:
I don't understand.
Right.
That also eats power.
>> Abhinav Pathak: Right. So that goes into 3G energy consumption, the radio
energy consumption. So if you're modeling for radio, you need to capture that.
If you're modeling for screen, you need to capture what is going on on the
screen. If you're modeling for WiFi, you capture what is going on in the WiFi.
When there are multiple based changes like 50 packet per second and so on, what
we do is basically we change the size of the system call exponentially to see
if we are observing any different power characteristics than the simple model,
and we build the power model accordingly.
>>: I'm somewhat confused on how you capture half dependence. Because
wouldn't you need a different Markov model if you're using two consecutive
reads spaced by, let's say, ten milliseconds.
>> Abhinav Pathak: Okay. So the thing is are the reads interfering or are the
reads not interfering? Let's say they're not interfering. So basically, what
you're seeing, when you're in [indiscernible]. Well, you just, you model it.
You take the phone in that state, you do a read call and then you model whether
it goes into a high [indiscernible] state and stays there.
What happens when there are multiple system calls which are overlap going to
15
the same hardware? Then you need to see in which order they have started and
which order they have ended. We don't have that information.
What we simply do is we combine the workload of those two read calls for the
time when they are executing and we just pay in a high power state for that.
Second step. Modeling multiple system calls going to the same component.
Observation here is these kind of power management works at a very, very low
level in device drivers. We don't expect there to be a huge number of
thousands or something like that states there. Because programming at that low
level is very hard. What we have seen in most of the devices, there are very,
very few states, three, four, five and the idea is if there are very few
states, you will see these states being repeated across system calls, the
finite state machines of different system calls.
And the idea is sit down, you need a human and identify which are the common
power states. How do we do it? WiFi NIC on HTC touch phone. Send system call
finite machine looks like that. You are in base state. You send a rate of
less than 30 packets per second. You go to low net state, across the
threshold, you come to high network state. Come to network tail. A timeout of
12 second. You go back to base state.
Socket close call on the same phone will take you from tail state to base
state. You need a human to sit down there and identify that this 110
milliamperes in network tail and socket close call is same as 110 milliamperes
in your same system call, basically. And then sit down and combine it. If
you're a network tail state, either you have a socket close or a timeout,
whichever happens first. You go back to base state and you try combining
different states for different system calls going back to the same component.
>>:
Why do you care about the combination?
>> Abhinav Pathak: Because when you're running an actual system, it may
happen, your same call sends you to one state and now there's another system
call. Now, that system call won't start from the base state, because your
component is in a different power state right now and you need to know what
happens if that call comes in that power state.
>>:
So why don't you just simply combine all the states for the same power?
16
>> Abhinav Pathak:
>>:
So why --
>> Abhinav Pathak:
>>:
[inaudible].
You need to know which power states are same, actually.
Can't you just do that based on the current run?
>> Abhinav Pathak: In all the experiments we did, yes. But it can happen that
two power states have the same value. So you need to start sitting there and
seeing what are the transitions that are coming into this state. What are the
transitions that are going out of this state. Using that information, you
realize whether they'll be at the same power states or not.
But yes, using everything that we have seen, a power value can, we can decide
based on that.
>>: Do you have one big giant [indiscernible] machine that represents all the
-- the combinations of all states of all power codes.
>> Abhinav Pathak: Right. So in most of the hardware in Android, let's say,
different components don't track in terms of power. So you build one finite
state machine for every hardware. For example, this finite state machine is
for WiFi NIC.
>>:
So you don't have --
>> Abhinav Pathak: In Windows mobile, yes, you do, because things start
interacting and now you don't -- you can't just work with a single finite state
of one hardware. I'm skipping that.
>>:
So are you manually --
>> Abhinav Pathak:
>>:
[inaudible].
>> Abhinav Pathak:
>>:
Yes.
Right.
So I'm wondering are you manually mapping each -- [indiscernible].
17
>> Abhinav Pathak:
>>:
So it's file system call --
>> Abhinav Pathak:
>>:
Right.
Correct.
[indiscernible].
>> Abhinav Pathak: So for every hardware, you get whatever system calls that
go to the [indiscernible]. For example, for Z-card, it goes file read and so
on. You build a finite state machine for every system call. Most of the
system calls, you don't [indiscernible]. So first state is you build up all
the system calls that are capable of manipulating power state.
>>:
Manually.
>> Abhinav Pathak:
combining things.
>>:
Yeah, it's manual right now.
And then you combine, start
[indiscernible].
>> Abhinav Pathak: We didn't observe this, but if hardware are interacting in
terms of power, then implicitly, yes. But the question is how many system
calls in backed hardware in our experience? Not more than five, six.
Depending on what hardware you're using. So it's still a small space. You can
do things manually.
And moreover, power modeling is a one-time effort. It's all right, I spend
some time. Once I have the models, I can just read it to everybody. I don't
need to redo it again until the hardware changes or the device driver changes.
>>:
[indiscernible].
>> Abhinav Pathak:
>>:
It is use less power than [indiscernible].
Right.
So I guess [indiscernible].
>> Abhinav Pathak: Right. So if the device driver changes, possibly, it is
possible that power model changes. Because you can go ahead and fix it. And
18
there's no way around it right now. You have to actually go ahead and do this
thing. Think of this as you're writing a device driver. For every OS, you
need to write a device driver and then you need to build a power model. Yeah.
>>:
Are there things that don't [indiscernible] memory mapped file?
>> Abhinav Pathak: Yeah, most of the applications use system calls as the
primary interface to access OS. But there are ways around it. You can do app
builds, you can do mem map, things like that. Right now, we don't cover it but
we can extend it possibly.
>>: What about [indiscernible] because delay in the fact, you do a write but
eventually a worker thread decides to batch up 30 seconds or ->> Abhinav Pathak: Caching. Any cache layers. There are several cache
layers. There's caching at GPS. There's caching at SD card. There's caching
at network and things like that. The general idea is you need to log
everything above the cache and everything below the cache and you know when is
the hardware getting triggered, when is the requests coming in. We did it for
GPS and Android, because it's easy to go at a framework level and do this kind
of stuff. Low level device drivers, no, right now it's out of hands but we
don't have this information.
But the framework is the same.
You need to log below.
The [indiscernible].
You need to log above.
We implemented in Windows mobile 6.5 using Celog, and Android using systemtap.
We log kernel events. Some are going into details of this. There's a huge
amount of engineering involved here. So make sure that overhead is very less
when the applications are running so that they don't reduce a lot of energy
drain. Because of log and framework.
The results, how do they look like? We run a different application on Android
Windows mobile 6, YouTube, Facebook, maps, chess, virus scan, document
converter, different things. And we use the finite shape machine models, and
the linear [indiscernible] based model, the [indiscernible]. We plug the Y
axis as the inner percentage. We run the application for 10, 15, 20 seconds.
We see how much energy is being predicted, consumed by our model, how much
energy is actually drained using a power meter, and we plug the error bars.
19
We see finite state machine model is under 4%, and linear [indiscernible]
modules are 1 to 20 percent failure. But not only that, when we see what
happens at fine grain intervals, in the state of measuring energy for like 20
seconds, look at how is the prediction. We found that is very good. At 50
milliseconds, when 80 percent of the 50 millisecond band has less than 10%
errors in our model.
>>: Yes. So the impact of the screen [indiscernible] diminish your error
percentage because it's such a large and constant factor?
>> Abhinav Pathak:
>>:
Absolutely.
What happens when you look at these numbers taking out the screen content.
>> Abhinav Pathak: Of course they'll increase. [indiscernible] brightness,
but the problem with the doing that is when you remove the screen off, and at
some point you're predicting some energy consumption where the phone is not
actually consuming energy, you're getting infinite error, basically. So there
are [indiscernible] where the model, very fine grain [indiscernible] if you're
looking alt dock time and tool, there are [indiscernible] where a model returns
100% error sometimes. That is true. But overall, we are pretty good. Most of
the 50 milliseconds [indiscernible].
>>:
[indiscernible] just as a [indiscernible].
>> Abhinav Pathak:
>>:
You said less fragment so --
>> Abhinav Pathak:
>>:
Right.
[indiscernible].
>> Abhinav Pathak:
>>:
It depends on what hardware you have.
It ranges from 25 percent to 75 percent.
25 to 75?
>> Abhinav Pathak: 25 to 75 percent. Yeah. And [indiscernible] phone, 25%
[indiscernible]. Again, it depends on hardware, whether it screens is all lit
or whether the screen is LCD. LCD screens is what I'm quoting. All lit
20
screens are even less because it depends on what color the pixels and things
like that.
>>:
[indiscernible].
>> Abhinav Pathak: No. It's all over the space. It's all over the space.
Sometimes we are [indiscernible] predicting and things like that.
>>: So you've described this system with finite state machines that you spent
a lot of time optimizing and getting right. You described the problem with
linear regression. You described four of them, many of which one might
correct. For example, one might try [indiscernible]. How do I know that if
you had spent the same amount of time trying to get a regress model right that
you wouldn't do better than you did here given the number of things that a
research here was interested in seeing the regression model succeed might have
done to better optimize the regression model?
>> Abhinav Pathak: Right. So there are multiple ways you can optimize
regression models. You can build different models in finite state machine and
so on. And what we see once we build the model, we observe that the most
important property of our model is not the accuracy. It is something else.
I'll go to the next slide. It is you're tracking it back to the application
and now you can do energy accounting.
But yes, you can make linear regression better. Of course, that's maybe
possible. We have not done that. But if you're working on performance
counters, it's very hard to answer the question, which thread, which routine is
consuming how much energy.
>>: Whenever you get poor application, you set up resources and you can easily
build a ->> Abhinav Pathak: If you get those kind of numbers, that's good. Which
routine is consuming how much PC, which routine is consuming how much disk, how
much network. If you get that kind of ->>:
[inaudible].
>> Abhinav Pathak: At application layer, you get those numbers. But we
started from the fact that which routine, which thread we want that kind of
21
information.
>>: As a scientist, I'm still really bothered here, because we've gotten -you're doing a test that's combining, that's comparing two things. But one of
the things that's different about these two models is that one's finite state
based, one's regression model. Another thing that's different is you're using
different independent variables. In one you're using performance calculators,
as you said, and in one you're using sis calls.
>> Abhinav Pathak:
Right.
>>: And you could have built the finite state machine -- you could have built
the linear regression based on every one of the independent variables in your
linear regression, the number of times the system call was called.
>> Abhinav Pathak: Um-hmm. You're not saying that that cannot be done. That
can be done. You can [indiscernible] linear regression based models. You can
make them accurate. But the problem is until and unless you don't remove those
assumptions from those model, you cannot get them right. The assumption is
only active utilization. Energy scales linearly. If you remove those
assumptions, then probably you can do a better job, irrespective of what
mathematical model you use, is it linear regression, finite state machine and
so on.
>>: So question, as a Seattle resident, not a scientist. To restate the
scientist question, there are two things you changed. One is you went from
instrumenting [indiscernible] system calls. You went from linear regression
model to finite state model. In your intuition, which is the real
[indiscernible]. Is one of them key and the other one irrelevant?
>> Abhinav Pathak: So there are two things. One is accuracy. How good you
are in terms of prediction. And what we know from the power behavior, that
there is actually finite state based measure power mod until device drivers.
Something happens to change the power. Something happens, you bring back the
power. So we are good at capturing that.
The second important thing is -- next couple of slides -- how do we track it
back to application.
>>:
I don't understand.
You're saying finite state machine is the important
22
part?
>> Abhinav Pathak: Finite state machine is an important part.
important is the second half that's based on system calls.
>>:
Equally
Just trying to get a sense of the size of the finite state machine.
>> Abhinav Pathak: We observed four or five states in most of the hardware.
few transitions. That's the size for each of the much of the hardware
components.
A
>>: One other question. Do you know why linear regression is doing better in
the [indiscernible] scenarios?
>> Abhinav Pathak: Game is mostly CPU. CPU comes in from linear if you're
running at the highest frequency. That's why it's performing roughly better,
when you're looking into energy consumption. I removed the graph that says
what happens in individual prediction. There, in some places, it is
overestimating. In some places it is under estimating. So when you're looking
at it, you're getting really good result. They're cancelling each other out.
So we started with this question, where is energy spent inside my application.
We did the power model. Second thing is we need to track application activity
and the granularity of tracking depends on what is your requirement. Do you
want per routine, per thread. Maybe a combination of routine. So we say let's
try to do per routine, because that's the modeler programming language.
We use EProf like mechanisms to predict how much CPU energy is being spent in
different routines when they're running on CPU, because it's very hard to
profile. It's very high overhead to profile when a routine starts, when a
routine ends. We're sampling PLDI '82 paper.
The third thing is you need to map power activity to app activity and we used
the most important part of this finite state machine model. That is, it's
based on system call. And now you can track system call all the way to where
they're coming from. Get PID, which process is doing it. Get TID, which trade
is doing it. Back trace with stack of routine is doing it. And now you can do
energy accounting.
Then there is one problem of lingering energy consumption and we need an
accounting policy. For this.
23
So what is lingering energy consumption? The first case is very simple. The
tail energies. Let's say a routine foo sends ten kilobytes of data. The send
is done. But then, even then, 3G consumes a high tail energy for up to seven
seconds. This foo is the cause of this tail energy. But even after foo is
completed, the thread is completed, the process is completed, the hardware
lingers on and consumes more energy. You need to take care, when you're
accounting the energy back to different entities.
The second case of lingering energy consumption comes from somebody called
persistent state wake locks and now we are going to the second half of the
talk. What are the wake locks? Smartphones have a very, very aggressive
sleeping policies. The moment you top interacting with the phone, within a few
seconds, the screen shuts down, the CPU shuts down, the system is frozen, and
that is why your phone lasts a day in your pocket.
This has a lot of problem for programmers, because your program may be in the
middle of doing something very important, like it's talking to the server, and
the CPU sleeps, and the server says where has the phone gone? So the
smartphone always says give developers some APIs using which they can keep
confidence on explicitly, even though user is not interacting.
For example, you're talking on Skype. You need for the screen to be on. You
don't want to touch your phone every five seconds. The screen needs to be on
from the call socket and the screen needs to go off when the call is ended.
The thing is let's say a foo routine says keep the screen on. It acquire is a
wake lock for the screen. And the foo is over, but the screen continues to
consume energy even though the foo is over. It's consuming energy on behalf of
foo. And this is another example of lingering energy consumption.
So let's say how do we deal with lingering energy consumption? This is an
accounting question. I do a send, send ten kilobytes. There is a tail.
24
we send the energy into utilization, base energy and tail based energy and the
first thing we say is thee two energies are different, U and T. Let's
represent energy as a tuple rather than [indiscernible]. And in this example,
it's very simple, very easy to observe that [indiscernible] tuple should be
assigned or accounted completely to this send system call and whoever is
calling the send system call. Nobody else is responsible for it.
What happens when there are multiple system calls? I do a same one. I come
into tail, but before the tail is over, a send two comes in or a receive comes
in. There is a utilization energy, there is a tail energy. There is a
utilization energy and tail energy. First thing, the first tail, T1, belongs
to send one. Because send one started that tail. The question is, how do you
split T2 among these two system calls?
There are several policy. There's nothing right and wrong here. Let's explore
the policies first. First is the average policy. Split tail energy T2 in some
weighted ratio among send one and send two. But the problem with this is it's
not always to define weights. For example, instead of send one, let's say
[indiscernible] open system call and send two is a file read system call. How
do you define weight for file open? It could be a connect system call, it
could be a send system call.
And the problem is it gets complicated if there are a huge amount of system
calls and you need to wait until all of them are over and then you start
dividing them. We take a very simple approach. We call it as a last trigger
policy. We say forget about everything else. We are presenting energy
differently, allocate T completely to the last guy who called in. Send two.
And go on. You don't need to define weights. We don't need to -- the policy
is not complicated.
>>: Do you want to punish the guy? The guy who goes first is responsible for
taking you to a state where you're going to have the phone on for a while. The
guy who is second, if he times it right, he's just being opportunistic. So the
only additional cost really incurred if I as an application decide, hey,
somebody else has activated the phone, I should take advantage of it, is the
difference in time between when my call ends and when the previous guy ends.
Because that's the additional time that has to wait because I -- so if you take
a last trigger policy, then you aren't doing any reward to applications that
time their events opportunistically to minimize power. They're always actually
being punished. They're incentivized to actually waste power in the last
25
trigger policy.
>> Abhinav Pathak: This is an accounting policy. There's nothing right,
nothing wrong. It's just which gives more information to the developer to
debug their application energy consumption.
>>: By that token, [indiscernible] right.
but policies have incentives.
I think like yes, it's a policy,
>> Abhinav Pathak: Right. Policies are incentives. This is, again, not a
[indiscernible] system. You're doing at the end of it, you're collecting the
threads, you're predicting energy. What we realize is that at the end of the
day, whichever policy you use, you come up with a flat energy representation.
How much routine X consumes energy, how many routine Y consumes energy. And
this is very limited in terms of information, irrespective of what policy you
choose when you go on EProf application.
We come up with a new way of representation, which we call bundles. Which
actually take this question completely off the table. We are not accounting
tail energy at all, but we are representing the energy consumption to the
developer in such a way that he can quickly understand what is wrong in the
application, what he's doing wrong, and he an optimize that, because that's our
main goal.
>>: I don't think there's any way to do that intelligently at this point. So
I think it's just -- so I think that's an important feature you could add to a
system is to say, I want to just -- if someone else is sending, I want to send
then. But I don't think currently, in the operating system as far as I know,
you can't do that. If you could, that might be a problem. But right now, I
don't know if you can [indiscernible] so the incentive is to add the API,
right?
>> Abhinav Pathak: So once you such kind of API, you can do much, much more
things. Then you can get incentive. Then you can notify application more
opportune moment of doing things. But that is a completely ->>: Then you could actually have a system where you batch everything
intelligently. And if you just want to be an opportunistic app, then you could
get a lot of benefits.
26
>>:
I'll bring it up at the end of the talk.
>>:
There isn't, so you can't --
>>:
[indiscernible].
>>:
You can't do it yet.
[indiscernible].
>> Abhinav Pathak: So implementing the procedure, [indiscernible] very simple.
You do [indiscernible]. You're in the application. You have the application,
you embed energy APIs to expand the tracing, stop the tracing, install it on a
mobile OS where you have system call tracing enabled. You run the application,
gather [indiscernible], run it on a server to get in-depth energy profile, give
the read representation the bundle presentation that I'm talking about. It's
not in the slides, but we can talk about it. The run time over here of the
system was 2% to 15%. And the run overhead of the system was 1 to 13%
additional energy incurred by tracing.
We use this in several popular applications like Angry Birds, browser,
Facebook, New York times. And we had several insights on where energy is being
spent. The first one is free apps are getting a lot of energy doing everything
you don't need. I'll come back to it in the next slide.
A major finding is I/O consumes a lot of energy. CPU is not the bolts neck in
terms of energy. CPU is 10%, less than 15% of energy. And what we'll show
here is EProf helps detecting something called no sleep bugs.
First, sample case study. Angry Birds. One simple game, 30, 35 seconds, shoot
three birds, we found that the user tracking, tracking, uploading this
information consumed 45% of energy. Fetching advertisement consumed another
28% of energy. The core game play, the physics engine, consumed only 20% of
energy.
>>:
[indiscernible].
>> Abhinav Pathak: [indiscernible] before ice cream sandwich. Screen is
allocated. The accounting is done based on wake loss. Is accounted to who is
using the screen wake lock, which is basically the Android [indiscernible]
process. So Android doesn't get any share of energy.
27
But somehow, we are supposed to put that energy down into here. These numbers
go down. Instead of 75% of energy going into advertisement, it goes 40% or 5%
energy goes into advertisements.
This was picked up by popular press very recently. And there's a nice story
around it, how the press gets very aggressive. They first said people
[indiscernible] Angry Birds in the phone, 70% of energy is going in free
applications.
Next round of press releases, all three applications drain 75% of energy in
advertisement. Next shown, [indiscernible] says you should buy paid
applications. Next round, this study is from Microsoft. They are trying to
show Android is bad.
You have a question?
>>: For the app here, you're showing activities that's a very high level
application. [indiscernible].
>> Abhinav Pathak: Right. We have a huge profile for this, thousands of
groupings, and then this is basically clustering based on which threads.
There's a flurry thread that comes with Android with this -- does all this
tracking and getting out of [indiscernible]. So we are aggregated those
numbers.
Second example. We found something called a wake lock bug in Facebook
application. I run the Facebook application and I observe something called a
Facebook service is consuming something like 25% of energy. What is this
Facebook service? The Facebook service basically pulls the [indiscernible] to
see notifications. Somebody sent you a friend request, somebody wrote on your
wall. And this was surprising, 25% of energy, because during the experiment,
nobody wrote on my wall. Nobody had a friend request. I'm not that popular
anyway.
What I found, out of this 25%, roughly 25% of it was going into an acquire wake
lock routine in app session. What it was basically doing, it was telling the
CPU, please keep the CPU up, I'm doing something very important, and it ever
told the CPU otherwise, the CPU is free to sleep. And all the energy of the
CPU in low idle state is accounted to this guy. This is an energy bug. Very
popular in Facebook application.
28
We switch gears now. We built an energy profiler, which is a semi-automatic
approach. We go to an automated approach of finding no sleep bugs, one of the
most notorious kind of energy bugs. We'll show a compiler approach.
What is no sleep bugs? Why are they coming? What is this new problem here?
There's a lot of fickle shift in power management in when you move from
desktops to mobile. Everybody here has it in programs and java, C, C plus plus
and language. How often in your application source code you needed to take
care of the CPU is on? How often you needed to take care screen is on, and any
component is on? Never.
But when you move -- this is because the default power management philosophy,
the default thinking is everything is on. It's already there for you. You
move to smartphones, there's a different power philosophy. Everything is off
by default. And if you want something to be on, you need to explicitly ask for
it. Smartphone always says [indiscernible] turn things off when they provide a
lot of APIs.
What are these APIs? How do they look like? Let's say I'm doing some network,
trying to synch my emails to the network. This may take a minute or so over 3G
maybe, let's say. If in middle CPU sleeps, I'll have a problem, because now
the server thinks where has the mobile gone? What does the developer do? It
acquires something called a wake lock. Says to the OS, please keep the CPU on,
I'm doing something very important. I'm done doing important, I'll release the
wake lock and now you're free to sleep as per however you want.
Example from Skype, you need to keep the screen on. These are two example EPS
from Android. There are seven, eight of them from different companies. But
what that leads to a new phenomenon which we call power encumbered programming,
you're pushing all this management of sleep and wake cycle all the way up to
developers, and some of these developers could be high school kids trying to
earn a quick buck or even the most proficient programmers do a huge number of
mistakes when they're doing this.
This results in something called a no sleep bug, which is basically an
application telling a component, please stay on. I'm doing something very
important, and it doesn't tell always that I'm done doing something important.
You can turn it off.
29
If that component is CPU, 50 to 60 percent of battery goes in 12 hours without
you using the phone. If it is in screen, 100% goes in four to six hours. If
it is GPS, 100% goes in three to five hours. We have all these category,
including popular application like Google maps, SMS application, Facebook with
CPU category and so on, so forth.
So we started at looking at these no sleep bugs and we say
making these kind of errors? What is the problem they are
categorized no sleep bugs into four categories. One is no
sleep race conditions, dilations, sleep conflict. I'll go
them here in this talk.
why programmers are
facing? And we
sleep code paths, no
through three of
No sleep code path. The programmer actually said in his application, keep the
CPU up and somewhere it said the CPU is free to sleep, but the code took a
different path than what the programmer anticipated.
I'll give an example. Let's go back to our do network example. You're trying
to synch over a network. You acquire and release the wake lock. But let me
throw some java into it. Your synch over network throws exception. Something
bad happened in the program. You tried to open a program. It's not open, you
divided by zero and things like that. And the program just did try to catch
and across this block. When there is an error exception, you just print the
information, what is the error, debug it later and go ahead.
What happens, you're introduced a no sleep bug right there. You come here
inside, you tell the CPU don't go to sleep, you start synching over the
network. You catch the error, you print the error, you go on. Who tells the
CPU that it is free to sleep back again? Question?
>>:
Will all these four bugs go away if wake lock had a timeout in it?
>> Abhinav Pathak: That's a good question and that's one of the questions we
are looking. Some wake locks have timeout in them, but this is a question
we're looking from how do you improve the programming languages or API to
provide better solutions.
But the problem is even though I have a timeout, I acquire a wake lock for 60
seconds, let's say, and then the 60 seconds, I still want it to be on. So I
put this wake lock 60 seconds and I look and I make a bug there. The problem
still remains. But yeah, some of it can be taken off with a better user
30
programming construct or better API.
Yes.
>>: [indiscernible] replace the two calls with an object that has a distracter
[indiscernible].
>> Abhinav Pathak: A distracter is not always guaranteed to be code. We have
a very nice code [indiscernible] coming from Android where the developer is
absolutely confused what is going on. He put it in finalize and he wrote a
command across it saying I'm not sure if the file is going to get called
because java doesn't guarantee the finalist to be called, but it can happen.
Some code doesn't release the wake lock and I don't know what part it is, and
let me just try it. But it's not guaranteed again.
>>:
[inaudible].
>> Abhinav Pathak: But the problem is still again, most of the smartphone
applications, they don't get killed. They stay in the background. So if the
application is in, the garbage collector won't get [indiscernible] I mean, if
the wake lock is acquired.
>>:
I was thinking C plus plus.
It would have taken care of this.
>> Abhinav Pathak: That's one of the ways you can solve this problem, maybe.
We observe this bug in several applications. Facebook, Google calendar, the
dialer app which I started with.
No sleep race conditions, very similar. Let's say a process has two threads,
thread one, thread two. The wiggly shows the execution part and the time end
goes down. Before the red dot shows the threat will acquire a wake lock. It
tells a component please stay up. The green dot shows thread two says that
particular component, you are free to sleep. Everything goes right, it's all
good. But let's say the threads are organized differently. Thread two comes
up, says component, you can sleep. And thread one comes up and says please
stay up and nobody else after that comes up and says the component is free to
sleep. The [indiscernible] has this bar [indiscernible] not fixed. I'm going
to sleep on [indiscernible] and this is very interesting problem.
Let's say I'm doing something on the phone and the [indiscernible] file looks
like this, the timeline and the start consumed. What I did basically is I
started a WiFi transfer. The WiFi transfer is done. WiFi goes into a network
tail and at the end of two seconds, the device driver quick kicks in, says two
31
seconds have gone, nobody has done any communication, but the phone to sleep.
Put the NIC in a low power state.
So far, so good. What happens when sleep one fix strikes you? You're starting
your transfer, your network transfer is done. But when you're in tail state,
let's say the CPU sleeps aggressively. The problem is now the code piece that
will run on behalf of the device driver that will tell the NIC to go back to a
low power state will not run because CPU is sleeping. Your NIC keeps your
phone warm in your pocket.
We have a very nice video where we show lines of Android code and use a sleep
conflict and a vibrator. We asked the vibrator to run for ten seconds, but it
continues to run for ten minutes. And these kind of bugs are presented in GPS,
NIC, different hardware components.
Tracking and debugging all these energy bugs is a very big problem, a very hard
problem, and it requires energy from different parts of computer science like
architecture, programming languages, verification and so on. But so show how
we use compilers, one of the very simple technique from compilers is to sleeve
no sleep code paths and no sleep race conditions. We use this solution from
compiler one book that comes right out of the grad course. It's called
reaching definitions data flow problem, and the problem statement is pretty
simple. At every point in the program, statically, what is the definitions of
different variables reachable here? And you can do a lot of things using this
information.
For example, let's say at this point Y is greater than value, you know what are
the values of Y reachable here? And in this, the example is just Y is equal to
11.
So you can remove the [indiscernible] completely. You can remove the L part.
You just president the F part, because all this, it will be true. You can do a
lot of optimizations around this definition data flow.
How do you do it? I'm not going redevelop it. You build a control for a
graph. With the application, you compute some jen sets, some [indiscernible],
it's compiler 101, very simple. At the end of it, you get at every block what
are the definitions reachable.
For example, here, we say D2 can reach the exit block.
D3 can reach the exit
32
block, but not D1. We use this. Straightforward. We'll try to find out no
sleep bugs due to code paths. We have wake lock acquire. We have wake lock
releases. What we do is we build a control flow graph which looks like this.
I acquire a wake lock. If there is an exception, I go to cache. Otherwise, I
release the wake lock. I come to exit. I transform this simply saying when
you switch on a component, I turn the value to one. Whenever you switch off
the component, I turn that value to zero, and I [indiscernible] definition
problems straightaway here. I'll say, exit node, I get two combinations
reachable, DO and D1. D0 is good because of the value of zero. Wake lock is
coming in, which says that the component will shut down. This is a good value.
D1 is good. D0 is problematic because it's coming from an exception. It's
saying the wake lock value is one. You're reaching the end of the code path,
but still it is holding the wake lock.
As simple as it sounds, there's a lot of complication here coming out of even
based standard applications. All the applications are even based. Your
applications no longer have a simple name, and the call graph is very hard to
build there. We handle those kinds of stuff. We handle java run time issues
of what happens when issues comes run time exceptions and things like that. We
handle special code paths. This is a static technique. We're trying to be
conservative. The moment you try to be conservative, a huge amount of false
positives kick in. We apply a lot of cases to reduce the false positives.
We implemented this in soot. This is a java precompiler. We tested on 500
market applications, random applications. We found 48 applications which had
bugs. Most of the problem was incorrect event handling. The developers did
not realize -- they do not understand how Android event based model works, how
call transfer from one place to other.
Of the second category was if else, or exceptions.
Just like the Facebook bug.
In six places, the developer forgot release. He just acquired it and there's
no release throughout the code and there were a few miscellaneous categories.
To conclude, in this talk, I've talked about EProf, the first energy profiler
for smartphones which gives independent information of where energy is going
inside your application, which routine, which thread, which process consumes
how much energy. It's built on a power model, which is a very fine grain
accurate power model.
This is again a semi-automatic approach and we can build automatic approach to
33
track energy bugs. We showed one example from compilers. Future work, you can
use different areas and solve more energy bugs in an automated way. You don't
need the developer here.
But just to put things into picture, we are still looking at a very, very small
slice of the problem. You need to track energy bugs in framework and kernel
and firmware and hardware, as part of network. And that's the future work that
we propose.
I'll have just a couple of slides to talk about what is the other work I have
done during my PC life.
One of the work falls in the spam category. There was a point in life we were
tracking a multimillion node botnet spewing billions of spams every day. And
we were able to characterize some of the features of how botnet works and so
on.
Using the data, we showed that the state of the art spam campaign prediction
does not work. It has a huge number of false positives when it comes to
tracking botnet spam, and we built a first complete unsupervised spam
detection, system, which removes completely human from the spam protection
loop.
The second case, the second set of work is data center and internet
measurement. We worked for some time with bing team to reduce latency between
data centers. Between data centers and client and we were measuring something
in routing and internet. I'll be happy to talk about them in one on one
meetings.
And with this, I conclude my talk.
Thank you.
>>: Earlier, when you were talking about talking policies. You said it
doesn't really matter, you can imagine any kind of policy. It seems like an
accounting policy is better than another if it leads developers to better
understand their bugs.
>> Abhinav Pathak:
Right.
>>: Do you find that when you tried one accounting policy over another, it
called for developers in studies that would fail to find their bugs?
34
>> Abhinav Pathak: So if
allocate energy about who
looking for optimization.
in advertisement. How do
kind of information, then
you're looking for energy bugs, then you need to
is responsible for this kind of stuff. But if you're
For example, angry bird is consuming 75% of energy
I cut it down. If you're trying to look for those
yes, policy is important.
And we spent a lot of time around this, said what is the best policy. At the
end of the day, we realize irrespective of what policy you use, as long as
you're using a flat representation of energy, it's not that [indiscernible].
>>:
So it didn't matter.
Developers were just as effective --
>> Abhinav Pathak: Right. And that is why we moved away from flat
representation of energy to something called bundle, bundles of energy, which
is tackling specifically why is IO draining so much energy. I've not covered
this in the talk, but I can talk about it.
>>: Suppose you had, like, [indiscernible] -- suppose you told the user for
every app, how much energy used. What impact do you think that will have on
users and developers?
>> Abhinav Pathak: So there are a lot of tools currently running on smartphone
systems that tells user how much energy is being consumed, and there are a lot
of problems with those tools as well. Then the immediate impact we have is now
user have something in their hand they can complain about. You see I used my
smartphone, and this says Facebook is consuming a lot of energy. And now a lot
of pressure is there on Facebook.
Well, developers might want to handle it, developers don't want to handle it,
but then online there's a lot of pressure on them. But the set of tools that
we are developing is mostly targeted to developers, because they don't have,
right now, a tool that can help them debugging.
You require tools that really work not for users but for developers, for kernel
people, framework people, and so on, so forth. Because at every level, you
require an energy cutdown.
>>:
Thanks.
Download