Department of Computer Science Design, Integration, and Implementation of the DySER Hardware Accelerator into OpenSPARC Chris Frericks, Jesse Benson, Ryan Cofell, Chen-Han Ho, Venkatraman Govindaraju, Tony Nowatzki, Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison 1 Department of Computer Science Executive Summary DySER is a programmable in-core accelerator Minimally invasive, high performance, high energy eff. Implemented DySER in Verilog RTL Integrated DySER into OpenSPARC Full system implemented on off-the-shelf FPGA Software stack complete 10 month effort 2 Department of Computer Science Outline Motivation & Background on DySER What We Did What We Found Challenging What We Learned 3 Department of Computer Science Motivation Multi-core trends Performance limited by parallelism Performance limited by voltage-scaling Single-core trends Hard to improve performance Accelerators and specialization will likely drive future performance gains 4 Department of Computer Science DySER DySER concept first proposed HPCA 2011 Dynamically Specialized Execution Resources 2X to 10X performance increase and 70% reduction in energy Computation Kernel DySER Utilizes network of functional units Dynamically specialize hardware to match application phases 5 Department of Computer Science Generic Processor Fetch Decode Decode I$ Register File Execute Memory Writeback Exec Units D$ 6 Department of Computer Science Processor with Integrated DySER Fetch Decode Decode I$ Execute Memory Writeback Exec Units Register File D$ DySER 7 Department of Computer Science Program Execution with DySER ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ Processor Config ____________ ____________ - + x + + ____________ ____________ - + x ____________ Processor + + DySER 8 Department of Computer Science Outline Motivation & Background What We Did What We Found Challenging What We Learned 9 Department of Computer Science Prototype Components Software Stack FPGA C Code Compiler ____ _____ _____ _____ ____ _____ DySER _____ _____ _____ Proc _____ OpenSPARC Decode I$ Exec Units D$ Register File DySER ISA Extensions DySER μArch Interface 10 Department of Computer Science Software Stack Utilized LLVM compiler framework and IR C Code _____ _____ _____ _____ _____ _____ _____ _____ _____ C Code _____ _________ _________ _____ _________ _____ Map Find Frequently Regions to Executed DySER Regions _____ Proc DySER 11 Department of Computer Science ISA Extensions 10 config 110111 config 000 config 001 RS2[4:0] DySER_Init 10 DI1[4:0] 110111 RS1[4:0] DI2[4:0] V DySER_Send 11 DI1[4:0] 000000 RS1[4:0] 0 1000000000 RS2[4:0] DySER_Load 10 DO1[4:0] 110111 RD[4:0] unused 010 unused 1000000000 RS2[4:0] DySER_Receive 11 DI1[4:0] 000100 RS1[4:0] 0 DySER_Store 12 Department of Computer Science DySER μArch Interface Fetch Decode Decode I$ Execute Memory Writeback ExecConfiguration Units Data Out Register File D$ Data In DySER 13 Department of Computer Science RTL Modifications Stage Modules Modified Lines Modified Fetch Fetch Control Logic, Top Level Fetch Module 46 Thread Select Thread Switch Logic, 19 Thread Completion Control Logic Decode Instruction Decode Logic, Long Latency Instr Control Logic 210 Execute Execute and Bypass Control Logic, Bypass Mux Module, Top Level Exec Module 216 Store Buffer Datapath and Control Logic 23 Store Buffer Small change required 14 Department of Computer Science FPGA Bring-up Prototype mapped onto Xilinx Virtex 5 FPGA board Boots Unmodified OpenSPARC Ubuntu 7.10 linux DySER not part of critical path! 15 Department of Computer Science Outline Motivation & Background What We Did What We Found Challenging What We Learned 16 Department of Computer Science Compiler Debugging Can’t debug internal state of DySER, so ‘debugging’ backend was created Convert code generated for DySER back to SPARC Ensure intermediate representations are functionally correct _________ _________ _________ Proc _________ _________ _________ _________ _________ DySER _________ _________ _________ _________ _________ _________ _________ _________ Proc 17 Department of Computer Science ISA Extensions SPARC ISA uses most of its encode space Head room found in ‘Implementation Dependent’ Instructions DySER Implementation Dependent Instructions 10 config 110111 config 000 config 001 RS2[4:0] 010 unused DySER_Init 10 DI1[4:0] 110111 RS1[4:0] DI2[4:0] V DySER_Send 10 DO1[4:0] 110111 RD[4:0] unused DySER_Receive 18 Department of Computer Science DySER Configuration Original “DySER Config” proposal abandoned Instead…embedded configuration bits in instructions, let processor handle rest Mitigates system level interference Memory DySER Config<ptr> Memory … DySER_Init<0010101..> ... DySER_Init<0101101..> DySER_Init<1100100..> ptr: 0101010101010 DySER_Init<0100110..> 1010101011011 … … 19 Department of Computer Science FPGA Sizing Fit only a 2x2 DySER with 32-bit datapath Switches contribute heavily to DySER’s size Future work includes mapping 8x8 DySER Available OpenSPARC OpenSPARC OpenSPARC with 2x2 DySER with 4x4 DySER 32-Bit 8-Bit # Slice Registers 69120 19634 36358 25616 #Slice LUTS 69120 31010 57110 45419 20 Department of Computer Science Outline Motivation & Background What We Did What We Found Challenging What We Learned 21 Department of Computer Science Can be Integrated into Processor DySER + OpenSPARC prototype: Hardware and Software work Compiler work in ten months, hardware in six Design of DySER remained modular and extendable 22 Department of Computer Science ‘Least Invasive’ Mantra 3 person-months “understanding” OpenSPARC Pipeline organization Verilog RTL Avoided greedy integration strategy E.g. DySER load and store instructions 23 Department of Computer Science Speedup less than ideal Benchmark HPCA-11 bzip After Prototyping 1.08x hmmer 1.09x 1.30x h264ref 1.11x 1.36x gobmk 1.07x 1.20x libquantum 1.01x 1.09x mcf 1.00x 1.30x 1.50x Baseline: Single Issue In-Order Pipeline 24 Department of Computer Science Documentation ''What does this do?'' ''TBD: is this necessary?'' ''Kill the next three interrupts, after that, you are on your own.'' ''There must be a cleaner way to do this!'' Healthy skepticism of documentation is good 25 Department of Computer Science Conclusion DySER is an integrable accelerator Produced prototype in ten months time with a team of six graduate students Pay attention to system interaction of accelerator Compiler and RTL intensive work feasible with medium effort 26 Department of Computer Science Questions? Red = DySER Green = OpenSPARC Blue = Mem Controller 27 Department of Computer Science Verification Compiler OpenSPARC/DySER Prototype OpenSPARC comes with VCS test suite Included regression tests for normal SPARC execution Embedded DySER instructions into regression tests 28