Department of Computer Science x86 RISC CISC 1 Department of Computer Science Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures Emily Blem, Jaikrishnan Menon, and Karthikeyan Sankaralingam 2 Department of Computer Science x86 x86 x86 What role if any does RISC vs. CISC play in this power struggle? RISC RISC & CISC RISCCISC & CISC 3 Department of Computer Science ISA being RISC or CISC does not matter for modern microprocessors 4 Department of Computer Science Overview Methods 11 Key Findings 6 on performance 3 on power 2 on power/performance tradeoffs Conclusion 5 Department of Computer Science Overview Methods 11 Key Findings 6 on performance 3 on power 2 on power/performance tradeoffs Conclusion 6 Department of Computer Science Platforms BeagleBoard ARM Cortex A8 PandaBoard ARM Cortex A9 Linux 2.6 Intel Atom N450 GCC Intel Sandy Bridge Core i7 7 Department of Computer Science Workloads Mobile CoreMark WebKit Desktop Server SPEC CPU2006 Lighttpd CLucene Database kernels 8 / 37 Department of Computer Science Measurements Performance measurement on real hardware Extensive use of performance counters Cycles, instructions, cache misses, branch misses… Power measurements using Wattsup meters 9 Department of Computer Science Overview Methods 11 Key Findings 6 on performance 3 on power 2 on power/performance tradeoffs Conclusion 10 Department of Computer Science But first… What is performance? 1 𝑇 = 𝑁 × 𝐶𝑃𝐼 × 𝑓 “Iron Law of Performance” – Clark 11 Department of Computer Science Performance 30 (130) (72) (24) (344) Normalized Time 25 20 A8 Atom 15 A9 10 i7 5 0 Mobile SPEC - INT SPEC - FP Server Key Finding 1 Large performance differences due to varying clock frequencies and core characteristics 12 Department of Computer Science Cycle counts 6 Normalized Cycles 5 4 A8 Atom 3 A9 2 i7 1 0 Mobile SPEC - INT SPEC - FP Server Key Finding 2 Cycle count differences are less than 2.5X 13 Department of Computer Science Instruction counts Normalized Macro-ops Macro-op counts are nearly same across 2 ARM and x86 1.5 ARM 1 x86 0.5 0 Mobile SPEC - INT SPEC - FP Server Key Finding 3 CPI is less for x86 implementations 14 Department of Computer Science Instruction Mix Key Finding 4 ISA effects are indistinguishable 15 Department of Computer Science Key Findings 1. Large performance gaps across cores 2. After accounting for clock frequency, performance gaps within 2.5X 3. CPI is less for x86 implementations 4. ISA effects are indistinguishable 16 Department of Computer Science Why are performance gaps present? 10 Normalized Cycle Count 9 Instruction Count 8 Cache Related 7 Branch Related 6 Issue Width Related 5 4 3 2 1 0 Benchmarks 17 Department of Computer Science Case study: omnetpp Cycle Count (Billions) 5 Insts 4 Branch Branch Misses Misses 3 I-Cache Microarchitecture Key ISAFinding Effect: Effect 5 1: 2: 3: Issue 2 A9 A9experiences Performance experiences A9’s ARM issue has 15x gaps 29x width 4% more more due more isinstruction to branch half instructions microarchitecture thatmispredictions of cache i7’s misses Width 1 0 i7 A9 18 Department of Computer Science Key Findings 4. Large performance gaps across cores After accounting for clock frequency, performance gaps within 2.5X CPI is less for x86 implementations ISA effects are indistinguishable 5. Performance gaps due to microarchitecture 6. RISC or CISC choice does not play a role in performance-driving µarch decisions 1. 2. 3. Details in paper 19 Department of Computer Science Power and Energy 1. 2. 3. 4. 5. 6. Large performance gaps across cores After accounting for clock frequency, performance gaps within 2.5X CPI is less for x86 implementations ISA effects are indistinguishable Performance gaps due to microarchitecture RISC or CISC choice does not play a role in performance-driving µarch decisions 7. x86 implementations are higher power – dictated by performance targets 8. Power consumption is tied to microarchitectural design decisions 9. Energy consumption also tied to microarchitectural design decisions 20 Department of Computer Science Power-Performance Tradeoffs 40 Power (W) 35 30 25 i7 20 15 Cortex A8 Cortex A9 10 Atom 5 0 0 1 2 3 4 5 6 7 8 Performance (MIPS) Key Finding 10 Regardless of ISA, processors follow cubic power/performance trends 21 Department of Computer Science Energy-Delay analysis Considering ED, A15 is 46% lower than any other design we considered. Considering ED,2,i7i7isisbest more than 2X Considering ED>1.4 better than next best design Key Finding 11 Microarchitecture and design choices are key – not the ISA 22 Department of Computer Science Conclusion ISA being RISC or CISC does not matter for power and performance of modern processors. 23 Department of Computer Science What is the ISA’s role? Supporting specialization AVX crypto, Virtualization extensions Jazelle DBX, ARM Trustzone… Exposing more workload-specific semantic information to the substrate Transactional Memory support Reliability-oriented extensions Many more… 24 Department of Computer Science Questions? Additional resources (detailed report and raw data spreadsheet) available at http://research.cs.wisc.edu/vertical/isa-power-struggles 25