EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling 1 Introduction • Energy consumption is an important issue in embedded systems. – Mobile and portable devices. – Laptops, PDAs. – Mobile and Intelligent systems: Digital camcorders, cellular phones, and portable medical devices. • A typical networked embedded system consists of – Computing subsystem - driven by an embedded processor operated by a RTOS. – Communication subsystem - consists of a radio chipset driven by a firmware. A typical Embedded System Battery Computing Subsystem (Driven by RTOS) Micorprocessor, Digital Signal Processor (DSP) Communication Subsystem (Driven by Firmware) Radio, RF amplifiers, A-to-D & D-to-A ckts 2 Important Facts (1) • High performance is needed only for a small fraction of time, while for the rest of time, a lowperformance, a low-power processor would suffice. Peak Computing Rate is needed Work load Average rate would suffice Time 3 Important Facts (2) • Processors are based on CMOS technology where dynamic power is the bottleneck Dynamic power (due to switching activity) • P α V2 . f • Vα f V: voltage; P: power; E: Energy • E = P * Tcc • Ei = K .cci . f2 Tcc = CC/f Where Tcc : execution time; CCi : # clock cycles of task Ti. f : frequency at which Ti is run. 4 Variable Voltage Processors • Modern processors operate at multiple frequency levels. – Crusoe Processor: Transmeta Corporation – PowerNow! Technology: AMD – Intel XScale: Intel • Higher the frequency level higher the energy consumption 5 Dynamic Voltage Scaling (DVS) • DVS scales the operating voltage of the processor along with the frequency. • Since energy is proportional to f2 , DVS can potentially provide significant energy savings through frequency and voltage scaling. 6 Case study (iPhone 5) • iPhone 5’s power management system Multiprocessor (A6) Memories Computation System (operated by RTOS) RF Modem Power amplifier Computation System (operated by Firmware) DC/DC down converter Battery 3.8V - 5.45Wh 1440mAh LDO (Low Drop Out) 7 Simple DVS-Scheme DVS Task queue Next task Over loaded system f=F Under loaded f = F/2 8 DVS-example • Consider a task with a computation time 20 units. • Energy of Ti without DVS: Time taken = t1 (say) – E1 = K * 20 * F2. • Energy of Ti with DVS: – E2 = K * 20 * (F/2)2. • Clearly, E2 = (E1)/4 Time taken = t2 = 2 * t1 Therefore, if we reduce the frequency we save energy but, we spend more time in performing the same computation 9 Energy-Time Tradeoffs 60 40 20 Energy Savings 10 Time 10 Simple DVS scheme handling RT-task • Consider a real-time task T1 = (20, 30) • Applying the simple DVS scheme – T1 runs at maximum frequency (F) and meets the deadline with no energy savings – T1 runs at half the maximum frequency (F/2) and completes at time = 40 thereby missing its deadline 11 Simple DVS scheme handling RT-task Frequency No DVS F 20@F 20 30 time Frequency DVS: Low workload F F/2 Inference: DVS cannot be blindly applied to real-time embedded systems 20@(F/2) 20 40 time 12 Energy aware scheduling in RT Systems Objectives Minimizing energy consumption Meeting the deadlines 13 Real Time - DVS schemes The RT-DVS algorithms can be broadly classified based on the granularity at which voltage scheduling is performed as follows Inter-task DVS scheme: Voltage scheduling is done on a task by task basis. T1 T3 T2 Voltage scheduling points Intra-task DVS scheme: Voltage scheduling is done within a task boundary T1… …T1 T2… T3 …T2 14 Inter-task EDF • Static voltage scaling EDF • Cycle conserving RT-DVS 15 Static Voltage Scaling EDF: Motivation Pre-run schedule with holes WCi = worst case computation time @ Fmax wc1 wc2 wc3 Next arrival of T1 wc4 Holes in the pre-run schedule imply: EDF Test: ∑(wci/pi) < 1 at frequency = Fmax In other words, whenever ∑(wci/pi) < 1 there are holes in the EDF schedule 16 Static Voltage Scaling EDF: exploiting holes Pre-run schedule with holes WCi = worst case computation time @ Fmax wc1 wc2 wc3 Next arrival of T1 wc4 Processor typically idles during holes. Instead, the holes can be exploited to slowdown the processor to save energy 17 Static Voltage Scaling EDF Next arrival of T1 wc1 wc2 K*wc1 wc3 K *wc2 wc4 K * wc3 K * wc4 EDF Test: ∑(wci/pi) < 1 at maximum frequency = Fmax Static-VS EDF Test: K* [∑(wci/pi)] = 1 at frequency = Fmax/K 18 Static voltage scaling: Example • Task set: T1 = (1, 4) and T2 = (2, 8) • U = 1/4 + 2/8 = 0.5 (< 1) @ Fmax • What is the “k” at which the task set is still schedulable @ (Fmax / k): – Let K = x – U = (1*x)/4 + (2*x)/8 = x*(0.5) = 1 – X = 2, that is k = 2 – Therefore, we can operate at f = Fmax / 2 and still meet the deadlines 19 Static voltage scaling: Example Frequency Task set: T1 = (1, 4) and T2 = (2, 8) U = 1/4 + 2/8 = 0.5 (< 1) @ Fmax Fm 0 1 3 4 5 8 Time Finding the right frequency scaling parameter (say, k) U = (1*k)/4 + (2*k)/8 = 0.5*k = 1 @ (Fmax/k) This gives, k = 2. Therefore, operating frequency = Fmax/2 20 Static voltage scaling: Example Frequency Modified Task set @ (Fmax/2): T1 = (2, 4) and T2 = (4, 8) U = 2/4 + 4/8 = 1 @ (Fmax/2) Fm 0 Frequency Energy consumption: 1*F^2 + 2*F^2 = 3F^2 1 3 4 5 8 Time Energy consumption: 1*(F/2)^2 + 2*(F/2)^2 = (¾)F^2 Fm Fm / 2 0 2 6 8 Time 21 What if Ci < WCi ? Actual computation time K*c1 K *c2 K * c3 Next arrival of T1 K * c4 More holes left unexploited 22 What if Ci < WCi ? Actual computation time Next arrival of T1 Task T1 completes K*c1 K *wc2 K * wc3 K * wc4 Hole of size = (wc1 – c1) Slow down all these tasks proportionally 23 What if Ci < WCi ? (contd..) Next arrival of T1 K*c1 K’ *wc2 K’ * wc3 K’ * wc4 CPU Cycles are conserved by slowing down the remaining tasks 24 Cycle conserving EDF: Example • Task set: T1 = (3, 6) and T2 = (6, 12) • U = 3/6 + 6/12 = 1 @ Fmax • What is the “k” at which the task set is still schedulable @ (Fmax / k): – Let K = x – U = (3*x)/6 + (6*x)/12 = x*(1.0) = 1 – X = 1, that is k = 1 – Therefore, we should operate at f = Fmax in order to meet all the deadlines 25 Cycle conserving EDF: Example Frequency Task set @ (Fmax): T1 = (3,9) and T2 = (6,9) U = 3/6 + 6/12 = 1 @ (Fmax) Fm T1 0 Frequency Task T1 just completes in one unit creating holes 1 T2 3 6 9 Time 9 Time Fm T1 0 T2 1 3 6 26 Cycle conserving EDF: Example Frequency Task set @ (Fmax): T1 = (3,9) and T2 = (6,9) U = 3/6 + 6/12 = 1 @ (Fmax) Fm T1 0 Frequency New utilization = 1/9 + 6/9 = 7/9 Finding the right “k” 1/9 + (6*k)/9 = 1 K = 4/3 This is the right factor 1 T2 3 9 6 Time Task T1 just completes in one unit creating holes Fm T1 0 1 T2 3 6 9 12 Time 27 Intra Task Energy Management • Intra-task DVS: adjusts the voltage and clock speed within a task. • Identifies the slack time generated within a task due to workload variation. • Application code is preprocessed to enable the run-time clock/voltage adjustment. 28 Intra-task DVS B1 Intra-task RT-DVS Voltage scheduling points 20 Intra-task DVS algorithms typically work with the control flow graph (CFG) of the real-time programs. Each node in the CFG denotes a basic block of computation. B2 20 The edges in the CFG indicate the control dependency between the blocks. Objective is to assign proper clock frequency to each of the basic blocks so as to minimize the total energy consumption while meeting the task deadline. Different paths P1: B1, B2. P2: B1, B3, B4. B3 B4 10 10 B5 150 Deadline = 200 P3: B1, B3, B5. 29 Simple Intra-task DVS: example B1 Fmax 20 40@Fmax 40 B2 20 B3 10 Fmax Deadline = 40 30@Fmax 20 30 40 At time = 20, We know the exact branch 30 Simple Intra-task DVS: example B1 Fmax 20 40@Fmax 40 B2 20 B3 10 Fmax Deadline = 40 20 10@(Fmax/2) 20 40 At time = 20, We know the exact branch 31 Summary • DVS schemes can significantly reduce energy in embedded systems. 32