Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison Presented at ISCA 2012 1 Department of Computer Science Executive Summary Compiler/hardware co-design for efficient, generalpurpose GPUs Exception support with 1.5% overhead (no more than 4%) Demand paging Context switch support with 2.5% overhead (no more than 4%) Exploiting speculation provides > 10% energy savings 2 Department of Computer Science Outline Motivation and Background iGPU Mechanisms iGPU Architecture General exception handling Context switching Speculation support Software Hardware Evaluation Conclusion 3 Department of Computer Science CPU Evolution Retrospective IBM 360 era – precise exceptions as a performance tradeoff However, two key shifts in processor design – Virtual memory no longer optional Speculative execution on ILP processors 4 Department of Computer Science Precise exception handling and speculation was a key enabler for modern CPUs 5 Department of Computer Science GPU Architectural trends A single unified CPU-GPU address space Significant interest in supporting demand paging Emerging necessity for supporting speculation More workloads – “irregular” workloads Handling reliability problems 6 Department of Computer Science Need general purpose exception and speculation support for GPUs 7 Department of Computer Science Why not just borrow CPU ideas? CPUs use buffering to preserve arch. state Future file, History file, Re-order Buffer … But GPUs have 1000x as many registers Not practical! 8 Department of Computer Science Fundamental Challenges 1. Well defined restart point in program GPU pipeline and SIMT model make this hard 2. Preserving architecture state prior to restart Need to save 1000s of registers 9 Department of Computer Science Key Ideas of our Solution 1. Well defined restart point in program Idempotent code regions Restartable regions producing same effect Creation of restart points 2. Preserving architecture state prior to restart Regions constructed with small live state: 1 to 3 regs Save only this live state Preservation of necessary state 10 Department of Computer Science Outline Challenges and Implications iGPU Mechanisms iGPU Architecture General exception handling Context switching Speculation support Software Hardware Evaluation Conclusion 11 Department of Computer Science Exception Support Idempotent regions mark restart points Register file provides all the reqd. state! Idempotence guarantees correctness Creation idea Exception handler A B B Implicit checkpoints using idempotence 12 Department of Computer Science Outline Challenges and Implications iGPU Mechanisms iGPU Architecture General exception handling Context switching Speculation support Software Hardware Evaluation 13 Department of Computer Science Context Switch Exception is page fault B A ? ? B Page-fault handling 1. 2. 3. 4. 5. Cleanly remove process 1 Start another process and execute Get page from disk concurrently Restore process 1 Restart process 1 ? ? 14 Department of Computer Science Context Switch Exception is page fault B A ? ? B Page-fault handling 1. 2. 3. 4. 5. Cleanly remove process 1 Start another process and execute Get page from disk concurrently Restore process 1 Restart process 1 ? ? 15 Department of Computer Science Context Switch Must save and restore architectural state But...GPUs have megabytes of register state Save only live state Save only live state at points of minimal live state Department of Computer Science Context Switch Must save and restore architecture state But...GPUs have megabytes of register state Preserve Save only live state Save state at points of minimal live state Candidate cut point A B 22 B 4 9 Exception handler B 23 live registers # live # registers Implicit minimum live state checkpoints using idempotence idea Department of Computer Science Outline Challenges and Implications iGPU Mechanisms iGPU Architecture General exception handling Context switching Speculation support Software Hardware Evaluation Conclusion 18 Department of Computer Science Speculation Speculation generates state that is wrong Need even more buffers Recall: buffers are impractical for GPUs Tuning the Creation idea Use idempotence! Reduce re-execution cost by sub-dividing regions Implicit checkpoints with low re-execution overhead using idempotence 19 Department of Computer Science Speculation Misspeculation A B B1 B2 BB2 C C # live registers: 2 * Region construction details: Idempotent Processing, PLDI ‘12 20 Department of Computer Science Outline Motivation and Background iGPU Mechanisms iGPU Architecture General exception handling Context switching Speculation support Software Hardware Evaluation Conclusion 21 Department of Computer Science iGPU Architecture Application Compiler Hardware 22 Department of Computer Science iGPU Architecture - Software Form regions Preserve state Creation idea Preserve idea region marker instructions register reassignment, moves and spills region formation Reg. pressure state preservation 23 Department of Computer Science iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Device Code 24 Department of Computer Science iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Region formation Idempotent Device Code 25 Department of Computer Science iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Region formation State preservation Idempotent Device Code 26 Department of Computer Science iGPU Architecture - Hardware (not to scale) … L1 cache & TLB Creation idea SIMD Processor L2 Cache General Purpose Registers Core … Core Core … Core RPCs Fetch Unit Decode 27 Department of Computer Science iGPU Architecture - Hardware (to scale) Restart PC Register General Purpose Registers 2 RPCs per warp one each for Sparse and Short regions Compare to 1024 GPRs per warp (32 x 32) 28 Department of Computer Science iGPU Architecture - Hardware Preserve idea State preservation handled purely by compiler! Not hardware’s responsibility 29 Department of Computer Science Outline Motivation and Background iGPU Mechanisms iGPU Architecture General exception handling Context switching Speculation support Software Hardware Evaluation Conclusion 30 Department of Computer Science Evaluation iGPU Runtime Overhead Region Creation Context Switch and Speculation support overhead 4.5 4 % Overhead 3.5 3 2.5 2 1.5 1 0.5 0 31 Department of Computer Science Evaluation – Voltage Speculation Energy Savings on iGPU with Voltage Emergency Prediction 20 18 Vdd reduction : 10% Error rate : 0.01% % energy savings 16 14 12 10 8 6 4 2 0 32 Department of Computer Science Outline Motivation and Background iGPU Mechanisms iGPU Architecture General exception handling Context switching Speculation support Software Hardware Evaluation Conclusion 33 Department of Computer Science Executive Summary Compiler/hardware co-design for efficient, generalpurpose GPUs Exception support with 1.5% overhead (no more than 4%) Demand paging Context switch support with 2.5% overhead (no more than 4%) Exploiting speculation provides > 10% energy savings 34 Department of Computer Science Conclusions Exception support for GPUs is practical Enables better integration with CPUs in CPU-GPU architectures Speculative execution on GPUs Both for performance and reliability presents interesting possibilities in the context of “irregular” workloads 35 Department of Computer Science Questions 36