Minshu Zhao Power Management in Multicores Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores Characteristics ◦ Apply power management technique Future of multicore Review on low power technique Clock gating EN FF CK ◦ + Gating can be done on fine grained ◦ + Save dynamic power ◦ - Not affect static power Power Gating ◦ + save both dynamic and static power EN ◦ - need microseconds to power up again ◦ - lost data or need some form of state retention Vdd FF Review on low power technique Voltage (Frequency) Scaling ◦ Scale down frequency and/or voltage, sacrifice performance for power I ∝ (Vdd-Vt) ~ Vdd f ∝ Vdd P ∝ CV2f ∝ V3 Variable device threshold ◦ Use high vt transistor to reduce leakage ◦ + reduce leakage ◦ - vt is generally fixed for one transistor Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores Characteristics ◦ Apply power management technique Future of multicore Identify Multicore Characteristics Half of the chip is cores ◦ Large dynamic power ◦ Unbalanced power consumption among cores Another Half of the chip is Cache ◦ Large Leakage Power Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores Characteristics ◦ Apply power management technique To Cores To Caches Future of multicore Traditional DVFS Motivation ◦ Large Computation/Memory Gap Problems to apply to multi-core Power supply Off-chip regulator ◦ Slow Microsecond timescales ◦ Coarse-grained adjustment In operating system ◦ All cores arrive at a single chip-wide VF setting Lose potential power saving Core0 Core1 Core2 Core3 Per-core DVFS & on-chip regulator On-chip vs. off-chip regulator ◦ Tens of nanoseconds vs. microseconds Per-Core vs. ChipWide DVFS ◦ Benefit heterogeneous workload Power supply Off-chip regulator On-chip Regulator Core0 Core1 Core2 Core3 Wonyoung Kim; Gupta, M.S.; Gu-Yeon Wei; Brooks, D.; , "System level analysis of fast, per-core DVFS using on-chip switching regulators," High Performance Computer Architecture, 2008. HPCA 2008. Per-core DVFS & on-chip regulator Application ◦ Multi-Core Global Power Management Monitor power & performance Apply policies by per-core DVFS Problem ◦ Overhead is large Thread Motion App B Low IPC High IPC High-VF Activity App A Low-VF Time Cores have different Voltage-Frequency setting Migrate thread between cores Apply DVFS benefits to program variability by observe micro architectural events Fast movement create effective voltage level Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread motion: finegrained power management for multi-core systems. In Proceedings of the 36th annual international symposium on Computer architecture (ISCA '09). Thread Motion Application ◦ Thread Motion Framework Evaluation driven by micro architectural events Time-driven Miss-driven Predict IPC for the next interval Move thread if needed Problem ◦ Potential Cache penalty Clustered multicore with shared L1 cache within cluster ◦ Register file transfer penalty Store them in the shared cache Heterogeneous Cores Motivation ◦ Different applications have different resource requirements Large ILP -> VLIW ◦ Different Power conditions full battery vs. low battery Combine existing processor architecture and do core-selection to minimize energy Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. 2004. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In Proceedings of the 31st annual international symposium on Computer architecture (ISCA '04). Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores Characteristics ◦ Apply power management technique To Cores To Caches Future of multicore Gated-Vdd cache Use high- Vt transistor to turn off power supply + reduce power when turn off - data stored in low power mode are lost Vdd SRAM CELL Gated-vdd control Gnd Michael Powell, Se-Hyun Yang, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar. 2000. Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the 2000 international symposium on Low power electronics and design (ISLPED '00). ACM, New York, NY, USA, 90-95. Gated-Vdd cache Application ◦ Dynamically resizable i-cache Evaluate miss rate at every time interval and upsize/downsize the cache using gated-vdd Problem ◦ Data remapping on the fly Yang, S.; Powell, M.D.; Falsafi, B.; Roy, K.; Vijaykumar, T.N.; , "An integrated circuit/architecture approach to reducing leakage in deep-submicron highperformance I-caches," High-Performance Computer Architecture, 2001. HPCA. Gated-Vdd cache Application ◦ Cache Decay Turn a cache line off if some cycles elapsed since last access The decay interval can be adaptive to the program Problem ◦ Data lost in sleep cache line, suffer cache miss Kaxiras, S.; Zhigang Hu; Martonosi, M.; , "Cache decay: exploiting generational behavior to reduce cache leakage power," Computer Architecture, 2001. Proceedings. 28th Annual International Symposium on , vol., no., pp.240-251, 2001 ABB-Multi-threshold CMOS Increase Vsb in the sleep mode Effectively increase vth to reduce leakage + State Preserved in sleep mode - Need long time to switch from sleep 1.0V 1.0V 1.0V / 3.3V 0V / 1.0V 0V K. Nii, et. al. A low power SRAM using auto-backgate-controlled MT-CMOS. Proc. of Int. Symp. Low Power Electronics and Design, 1998, pp. 293-298. 0V Drowsy Caches Apply DVFS to Cache + Waking up cost is small + State preserve - Save not as much leakage power drowsy 1V Vdd 0.3V drowsy SRAM CELL Krisztián Flautner, Nam Sung Kim, Steve Martin, David Blaauw, and Trevor Mudge. 2002. Drowsy caches: simple techniques for reducing leakage power. In Proceedings of the 29th annual international symposium on Computer architecture (ISCA '02). IEEE Computer Society, Washington, DC, USA, 148-157. Drowsy Caches Application ◦ Simple policy Put all lines into sleep periodically and wake up afterwards ◦ No-access policy Put the lines which is not access in the window in sleep ◦ 90% of the lines can be drowsy mode Avg Normalized total energy Normalized leakage energy Run time increase 0.46 0.29 0.41% Problem Leakage power Drowsy cache Gated-Vdd 6.24nW 0.02nW Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores Characteristics ◦ Apply power management technique Future of multicore Future multicore Dark silicon (transistor under-utilization) ◦ Power constraints Power down the transistor to reduce power ◦ Memory wall Waiting for the memory to continue computation ◦ Lack of parallelism Do not have enough work for transistor Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11). Future multicore Power constraints ◦ New Device– FinFET Memory wall ◦ New Technology – 3D IC Lack of parallelism ◦ Auto parallization Thank you !