Energy Efficient Scheduling

advertisement
EE5900 Advanced Embedded
System For Smart Infrastructure
Energy Efficient Scheduling
1
Introduction
•
Energy consumption is an important issue in embedded systems.
– Mobile and portable devices.
– Laptops, PDAs.
– Mobile and Intelligent systems: Digital camcorders, cellular phones, and portable
medical devices.
•
A typical networked embedded system consists of
– Computing subsystem - driven by an embedded processor operated by a RTOS.
– Communication subsystem - consists of a radio chipset driven by a firmware.
A typical Embedded System
Battery
Computing Subsystem
(Driven by RTOS)
Micorprocessor,
Digital Signal
Processor (DSP)
Communication Subsystem
(Driven by Firmware)
Radio, RF amplifiers,
A-to-D & D-to-A ckts
2
Important Facts (1)
• High performance is needed only for a small fraction
of time, while for the rest of time, a lowperformance, a low-power processor would suffice.
Peak Computing
Rate is needed
Work load
Average
rate would
suffice
Time
3
Important Facts (2)
• Processors are based on CMOS technology where
dynamic power is the bottleneck
Dynamic power (due to switching activity)
• P α V2 . f
• Vα f
V: voltage; P: power; E: Energy
• E = P * Tcc
• Ei = K .cci . f2
Tcc = CC/f
Where Tcc : execution time;
CCi : # clock cycles of task Ti.
f : frequency at which Ti is run.
4
Variable Voltage Processors
• Modern processors operate at multiple
frequency levels.
– Crusoe Processor: Transmeta Corporation
– PowerNow! Technology: AMD
– Intel XScale: Intel
• Higher the frequency level higher the
energy consumption
5
Dynamic Voltage Scaling (DVS)
• DVS scales the operating voltage of the
processor along with the frequency.
• Since energy is proportional to f2 , DVS
can potentially provide significant energy
savings through frequency and voltage
scaling.
6
Case study (iPhone 5)
• iPhone 5’s power management system
Multiprocessor (A6)
Memories
Computation System
(operated by RTOS)
RF Modem
Power amplifier
Computation System
(operated by Firmware)
DC/DC down converter
Battery
3.8V - 5.45Wh
1440mAh
LDO (Low Drop Out)
7
Simple DVS-Scheme
DVS
Task queue
Next task
Over loaded
system
f=F
Under loaded
f = F/2
8
DVS-example
• Consider a task with a computation time 20 units.
• Energy of Ti without DVS:
Time taken =
t1 (say)
– E1 = K * 20 * F2.
• Energy of Ti with DVS:
– E2 = K * 20 * (F/2)2.
• Clearly, E2 = (E1)/4
Time taken =
t2 = 2 * t1
Therefore, if we reduce the frequency we save energy but, we spend more time
in performing the same computation
9
Energy-Time Tradeoffs
60
40
20
Energy
Savings
10
Time
10
Simple DVS scheme handling RT-task
• Consider a real-time task T1 = (20, 30)
• Applying the simple DVS scheme
– T1 runs at maximum frequency (F) and
meets the deadline with no energy savings
– T1 runs at half the maximum frequency
(F/2) and completes at time = 40 thereby
missing its deadline
11
Simple DVS scheme handling RT-task
Frequency
No DVS
F
20@F
20
30
time
Frequency
DVS: Low workload
F
F/2
Inference:
DVS cannot be
blindly applied to
real-time
embedded
systems
20@(F/2)
20
40
time
12
Energy aware scheduling in RT Systems
 Objectives
 Minimizing energy consumption
 Meeting the deadlines
13
Real Time - DVS schemes
 The RT-DVS algorithms can be broadly classified based on the
granularity at which voltage scheduling is performed as follows
Inter-task DVS scheme: Voltage scheduling is done
on a task by task basis.
T1
T3
T2
Voltage
scheduling
points
Intra-task DVS scheme: Voltage scheduling is done
within a task boundary
T1…
…T1
T2…
T3
…T2
14
Inter-task EDF
• Static voltage scaling EDF
• Cycle conserving RT-DVS
15
Static Voltage Scaling EDF: Motivation
Pre-run schedule with holes
WCi = worst case computation time @ Fmax
wc1
wc2
wc3
Next arrival
of T1
wc4
Holes in the pre-run schedule imply:
EDF Test:
∑(wci/pi) < 1 at frequency = Fmax
In other words, whenever ∑(wci/pi) < 1 there are holes in the EDF
schedule
16
Static Voltage Scaling EDF: exploiting holes
Pre-run schedule with holes
WCi = worst case computation time @ Fmax
wc1
wc2
wc3
Next arrival
of T1
wc4
Processor typically idles
during holes.
Instead, the holes can be
exploited to slowdown the
processor to save energy
17
Static Voltage Scaling EDF
Next arrival
of T1
wc1
wc2
K*wc1
wc3
K *wc2
wc4
K * wc3
K * wc4
EDF Test:
∑(wci/pi) < 1 at maximum frequency = Fmax
Static-VS EDF Test:
K* [∑(wci/pi)] = 1
at frequency = Fmax/K
18
Static voltage scaling: Example
• Task set: T1 = (1, 4) and T2 = (2, 8)
• U = 1/4 + 2/8 = 0.5 (< 1) @ Fmax
• What is the “k” at which the task set is
still schedulable @ (Fmax / k):
– Let K = x
– U = (1*x)/4 + (2*x)/8 = x*(0.5) = 1
– X = 2, that is k = 2
– Therefore, we can operate at f = Fmax / 2
and still meet the deadlines
19
Static voltage scaling: Example
Frequency
Task set: T1 = (1, 4) and T2 = (2, 8)
U = 1/4 + 2/8 = 0.5 (< 1) @ Fmax
Fm
0
1
3
4
5
8
Time
Finding the right frequency scaling parameter (say, k)
U = (1*k)/4 + (2*k)/8 = 0.5*k = 1 @ (Fmax/k)
This gives, k = 2. Therefore, operating frequency = Fmax/2
20
Static voltage scaling: Example
Frequency
Modified Task set @ (Fmax/2): T1 = (2, 4) and T2 = (4, 8)
U = 2/4 + 4/8 = 1 @ (Fmax/2)
Fm
0
Frequency
Energy consumption:
1*F^2 + 2*F^2 = 3F^2
1
3
4
5
8
Time
Energy consumption:
1*(F/2)^2 + 2*(F/2)^2 = (¾)F^2
Fm
Fm / 2
0
2
6
8
Time
21
What if Ci < WCi ?
Actual
computation
time
K*c1
K *c2
K * c3
Next arrival
of T1
K * c4
More holes left unexploited
22
What if Ci < WCi ?
Actual
computation
time
Next arrival
of T1
Task T1 completes
K*c1
K *wc2
K * wc3
K * wc4
Hole of size = (wc1 – c1)
Slow down all these tasks
proportionally
23
What if Ci < WCi ? (contd..)
Next arrival
of T1
K*c1
K’ *wc2
K’ * wc3
K’ * wc4
CPU Cycles are conserved by slowing down the
remaining tasks
24
Cycle conserving EDF: Example
• Task set: T1 = (3, 6) and T2 = (6, 12)
• U = 3/6 + 6/12 = 1 @ Fmax
• What is the “k” at which the task set is
still schedulable @ (Fmax / k):
– Let K = x
– U = (3*x)/6 + (6*x)/12 = x*(1.0) = 1
– X = 1, that is k = 1
– Therefore, we should operate at f = Fmax in
order to meet all the deadlines
25
Cycle conserving EDF: Example
Frequency
Task set @ (Fmax): T1 = (3,9) and T2 = (6,9)
U = 3/6 + 6/12 = 1 @ (Fmax)
Fm
T1
0
Frequency
Task T1 just
completes in
one unit
creating holes
1
T2
3
6
9
Time
9
Time
Fm
T1
0
T2
1
3
6
26
Cycle conserving EDF: Example
Frequency
Task set @ (Fmax): T1 = (3,9) and T2 = (6,9)
U = 3/6 + 6/12 = 1 @ (Fmax)
Fm
T1
0
Frequency
New utilization = 1/9 + 6/9 = 7/9
Finding the right “k”
1/9 + (6*k)/9 = 1
K = 4/3
This is the right factor
1
T2
3
9
6
Time
Task T1 just
completes in
one unit
creating holes
Fm
T1
0
1
T2
3
6
9
12
Time
27
Intra Task Energy Management
• Intra-task DVS: adjusts the voltage and
clock speed within a task.
• Identifies the slack time generated within
a task due to workload variation.
• Application code is preprocessed to
enable the run-time clock/voltage
adjustment.
28
Intra-task DVS
B1
Intra-task RT-DVS
Voltage
scheduling
points
20
Intra-task DVS algorithms
typically work with the control
flow graph (CFG) of the real-time
programs.
Each node in the CFG denotes
a basic block of computation.
B2
20
 The edges in the CFG indicate
the control dependency between
the blocks.
Objective is to assign proper
clock frequency to each of the
basic blocks so as to minimize
the total energy consumption
while meeting the task deadline.
Different paths
P1: B1, B2.
P2: B1, B3, B4.
B3
B4
10
10
B5
150
Deadline = 200
P3: B1, B3, B5.
29
Simple Intra-task DVS: example
B1
Fmax
20
40@Fmax
40
B2
20
B3
10
Fmax
Deadline = 40
30@Fmax
20
30
40
At time = 20,
We know the exact
branch
30
Simple Intra-task DVS: example
B1
Fmax
20
40@Fmax
40
B2
20
B3
10
Fmax
Deadline = 40
20
10@(Fmax/2)
20
40
At time = 20,
We know the exact
branch
31
Summary
• DVS schemes can significantly reduce energy
in embedded systems.
32
Download