Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Authors: Wanghong Yuan, Klara Narhstedt Appears in SOSP 2003 Presented by: Samuel Kim Table of Contents • • • • • • • About the Authors Introduction Algorithm Implementation Results Related Works Summary About the Authors • Wanghong Yuan ▫ B.S., M.S. Peking University ▫ Ph.D. University of Illinois at Urbana-Champaign ▫ Software Engineer at Google • Klara Nahrstedt ▫ Ph.D. University of Pennsylvania ▫ Professor at University of Illinois at UrbanaChampaign Table of Contents • • • • • • • About the Authors Introduction Algorithm Implementation Experimental Results Related Works Conclusion Introduction • Multimedia Becoming A Standard in Mobile Computing ▫ Audio ▫ Video ▫ Data • Goal on Mobile Systems ▫ Manage System Resources Quality of Service - High Performance Energy Efficiency - Battery Life Greater Control Over System Resources • Hardware Adaptability ▫ CPU Voltage Scaling ▫ E = a•C•V•f2•t • Software Adaptability ▫ Application Quality levels ▫ Statistical Performance Requirements Soft Real-Time guarantees How Do We Approach System Resource Management? ▫ Adapt resources based on system layers Adaptive Layers Application Operating System /Network Hardware ▫ Most approaches in research adapt a single layer ▫ Possible to adapt across multiple layers? Multiple Layer Adaptation Requires Coordination • Conflict Adapting Multiple System Layers ▫ Scale down CPU ▫ Increase the application QoS • Different Objectives ▫ Minimize Energy Consumption ▫ Maximize Quality/Performance • Coordinate Objectives at a Higher Level The Purpose of GRACE Framework: Cross-Layer Adaptation • Global Resource Adaptation via CoopEration GRACE Current Approaches Operating System Operating System Architecture, Hardware Coordinator Application Network Protocols Architecture, Hardware Application Network Protocols • Adaptation over 1 or 2 layers • Global cooperation of resources Figures from S. Adve. “The Illinois GRACE Project: Global Resource Adaptation through CoopEration”, Workshop on Self-Healing Adaptive and Self-Managed Systems, 2002 GRACE-OS: Enhanced CPU Scheduler • Previous Methods ▫ Soft Real-Time (SRT) Scheduling ▫ Dynamic Voltage Scaling (DVS) • GRACE-OS ▫ DVS is integrated into the CPU Scheduler ▫ Continue to keep performance guarantees of SRT Scheduling Table of Contents • • • • • • • About the Authors Introduction Design and Algorithm Implementation Experimental Results Related Works Conclusion Design of GRACE-OS • Profiler ▫ How to estimate cycle usage? Monitor CPU cycle usage of a task Estimate demand by online profiling • SRT Scheduler ▫ How to allocate CPU Resources? Allocate CPU cycles to task based on profiler • Speed Adapter ▫ How to set CPU Speed/Voltage? Set CPU to minimum required speed based on #cycles allocated Algorithm: Profiler • How to estimate the cycle usage? • Estimate based on statistical distribution instead than instantaneous demand ▫ More stability in CPU speeds ▫ Meets performance requirements of SRT • Profile during run-time Algorithm: SRT Scheduler • Determine which task to execute ▫ When and how long (# CPU cycles) • Grace-OS is a stochastic scheduler ▫ Decide # cycles to allocate based on: Performance requirement, p Demand distribution of task ▫ F(C) = P[X ≤ C] ≥ p X, # cycles required for task C, # cycles allocated to task Algorithm: Dynamic DVS • As cycle number increases, CPU accelerates • Minimize energy consumption ▫ Constraint: CPU period is less than period allocated for task • Frequency a function of cycle count ▫ ∀𝑓𝑏𝑖 = 𝑚 𝑗=0 𝑠𝑗3(1−𝐹(𝑏𝑗)) 𝑠𝑖 (1−𝐹 𝑏𝑖 ) Table of Contents • • • • • • • About the Authors Introduction Algorithm Implementation Experimental Results Related Works Conclusion Implementation • Testbed ▫ HP Pavilion N5470 Laptop (Athlon Processor) ▫ Red Hat Linux 7.2 ▫ Modified Linux kernel 2.4.18 (GRACE-OS) • Software Architecture of Implementation Implementation • System calls added to support SRT tasks ▫ ▫ ▫ ▫ ▫ start_srt – start real-time mode exit_srt – exit real-time mode finish_job – tell scheduler that task finished job set_budget – allocate cycles for task set_dvspnt – set CPU speed in task’s speed schedule • Modifying the process control block ▫ 5 attributes Table of Contents • • • • • • • About the Authors Introduction Algorithm Implementation Experimental Results Related Works Conclusion System Call Overhead • System Calls: 900-1300 cycles • Multimedia Processing: 2x105 - 2x108 cycles ▫ 0.0004% - 0.5% of cycles per job Profiling and Estimation Overhead • Profiling Cost: 26-38 cycles • Overhead for online demand estimation is high (0.1% - 100% of cycles per job) ▫ Demand estimation should be infrequent ▫ Stable models allow for infrequent estimation Figure: Cost of Demand Estimation Speed Scaling Overhead • Costs 8,000 to 16,000 cycles (~10-50 us) • Should be invoked infrequently (500 us in GRACE-OS) • Speed change overhead should improve with processor design Stability of Demand Distribution • Codec: mpgplay a) Cycle usage varies greatly b) Demand distribution remains stable Stability of Demand Distribution (Other Codecs) Efficiency of GRACE-OS • Compare to other allocation schemes • Running Single Applications ▫ Misses deadlines 0.3%-0.6% ▫ 92% CPU busy time at lowest CPU speed ▫ 53.4%-71.6% reduction in energy • Running Multiple Applications ▫ Misses deadlines 4.9% ▫ 83.8% CPU busy time at lowest CPU speed CPU Usage for Multiple Applications • Dynamic DVS spends more time in lowest CPU speed than other DVS schemes Energy Efficiency of GRACE-OS • toast and madplay – Low CPU demand • GRACE-OS savings limited by CPU settings Impact of Setting Performance p • Normalized energy increases p = 0.5 to p = 0.95 • Fewer energy savings p = 0.95 to p = 1.0 ▫ Need more CPU settings Impact of p on Normalized Energy Impact of Mixed Workload • Extra allocation to extra best-effort applications increases energy consumption ▫ Less time for each application ▫ Increases total CPU demand Impact of Mixed Workload Table of Contents • • • • • • • About the Authors Introduction Algorithm Implementation Experimental Results Related Works Conclusion Related Works: Soft Real-Time Scheduling • Proportional Sharing ▫ A. Chandra, M. Adler, P. Goyal, and P. Shenoy. Surplus fair scheduling: A proportionalshare CPU scheduling algorithm for symmetric multiprocessors. In Proc. of 4th Symposium on Operating System Design and Implementation, Oct. 2000. • CPU Reservations ▫ M. Jones, D. Rosu, and M. Rosu. CPU reservations & time constraints: Efficient, predictable scheduling of independent activities. In Proc. of 16th Symposium on Operating Systems Principles, Oct. 1997. • Real-Time Scheduling Algorithms ▫ C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. JACM, 20(1):46–61, Jan. 1973. • Stochastic Scheduling ▫ K. Gardner. Probabilistic analysis and scheduling of critical soft real-time systems. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1999. Related Works: Dynamic Voltage Scaling • General Purpose DVS based on Average CPU Utilization ▫ D. Grunwald, P. Levis, K. Farkas, C. Morrey III, and M. Neufeld. Policies for dynamic clock scheduling. In Proc. of 4th Symposium on Operating System Design and Implementation, Oct. 2000. • Real Time DVS ▫ P. Pillai and K. G. Shin. Real-time dynamic voltage scaling for low-power embedded operating systems. In Proc. of 18th Symposium on Operating Systems Principles, Oct. 2001. • Stochastic DVS ▫ J. Lorch and A. Smith. Improving dynamic voltage scaling algorithms with PACE. In Proc. of ACM SIGMETRICS 2001 Conference, June 2001. ▫ F. Gruian. Hard real-time scheduling for low energy using stochastic data and DVS processors. In Proc. Of Intl. Symp. on Low-Power Electronics and Design, Aug. 2001. Table of Contents • • • • • • • About the Authors Introduction Algorithm Implementation Experimental Results Related Works Conclusion Conclusion • Pros ▫ ▫ ▫ ▫ ▫ Optimizes multiple layers of system resources Conserve energy while ensuring quality of service Small overhead Support for multiple tasks Thorough testing • Cons ▫ Estimate energy savings without measurement ▫ Testing limited to multimedia applications ▫ Limited number of tests per codec 8 runs per test Discard largest and smallest values ▫ Limited CPU speed settings decreases energy savings