A Reconfigurable Computing System Based on a Cache-Coherent Fabric Presenter: Neal Oliver Intel Corporation June 10, 2012 Authors- Neal Oliver, Rahul R Sharma, Stephen Chang, Bhushan Chitlur, Elkin Garcia, Joseph Grecco, Aaron Grier, Nelson Ijih, Yaping Liu, Pratik Marolia, Henry Mitchel, Suchit Subhaschandra, Arthur Sheiman, Tim Whisonant, and Prabhat Gupta Presented at CARL 2012 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copyright © 2012, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others 2 Topics • Motivations for this work • Usage Models • Overview of Intel QuickPath Interconnect (QPI) • Platform Architecture • Hardware Architecture • Programming Model • Simulation • Future Work 3 Motivation for this work • Keep innovation on Intel platforms • High-throughput, low-latency attachment to server • Cache-coherent memory • Enlarge memory space for FPGA platform • Enable additional programming paradigms • Drive requirements for future fabrics • Platform for simulation/emulation 4 Usage Models Accelerators •Algorithmic accelerators •e.g. Seismic imaging, genomics, computational finance CPU QPI Accel FPGA HW/SW Co-Emulation Platforms • Enables development of high performance emulation platforms • Pre-Si SW development 5 QPI ASIC Logic Intel QuickAssist Accelerator Platform (QAP) ASIC Prototyping •Significantly reduces risk for ASICs connecting to QPI •e.g. QPI attached ASICs for telecom, node controllers QPI RTL ASIC CPU QPI Bridge FPGA FPGA Farm OR Big Box Emulator Intel QuickPath Interconnect (QPI) QPI: A low latency, point-to-point coherent system interconnect. QPI Caching Agent (CA) – cache devices QPI Home Agent (HA) – DDR memory controller Latest Intel platforms have IO integrated into CPU (not shown) Four Socket Platform M 6 M M QPI Links QPI Links M P1 P0 P2 Two Socket Platform P3 CS CS I/O I/O I/O I/O M P0 QPI Links QPI Links P1 CS M Memory I/O I/O P Processor CS Chipset QPI-attached Accelerator Hardware Module (AHM) Substitution of AHM for CPU M P AHM installed in Server AHM M P M Accelerator Hardware Module (AHM) QPI Links M P CS QPI Links I/O I/O FPGA Xilinx Virtex 6 Module CS I/O I/O PCIe attached FPGA Altera Stratix IV Module Intel® Xeon processor 7000 series 7 Platform Architecture CPU AAL Interface Direct Driver Interface SW elements Host Application FPGA Accelerator Abstraction Layer System Protocol Layer Link and Protocol Link and Protocol PHY PHY Intel QPI 8 Accelerator Function Unit(s) (AFUs) SPL Interface CCI Core Cache Interface QPI HW stack FPGA Accelerator Function Units (AFU) System Protocol Layer (virtual memory subsystem) 9 •Application-specific hardware module, user algorithm to be accelerated. •Can connect to SPL or CCI interface •Implements virtual to physical address translation •Handles all cache line reordering in blocks of data •DMA engine •Memory management and protection QPI Link and Protocol (QLP) •QPI Caching (cache) and Home Agent (memory controller) implemented •64KB to 256KB set-associative cache •Manages coherency of FPGA attached DDR memory QPI PHY (QPH) •FPGA high speed IO operating at 4.8 to 6.4 GT/sec •Full width : 20 data lanes per direction QPI Home+Cache Agent Architecture • Modular: CA, HA, CA+HA • Interfaces: CCI- hides complexity of underlying coherent fabric. MC (optional) – provides interface to memory controller. • QPH electrical uses existing FPGA IOs • QLP designed using soft logic, tightly integrated, highly parallel. • Programmable cache/snoop tables To DIMM To AFU DDR SPL Memory Controller CCI Cache Data RAM MC interface Memory Queue Core Cache Module Rx Control Snoop tag Cache tag Tx Control QPI Link + Protocol Control 8 flits (640 bits) QPI PHY Rx Align (QPH) phits QPI link to CPU 10 QLP Snoop table Cache table 8 flits (640 bits) Tx Align Cache Hit/Miss protocol flow 11 • Cache hits are an order of magnitude faster. • Prefetch striding patterns to improve cache hit rate Programming Model AAL functions: • Allocate shared workspaces WSi and pin in system memory • Support both physical and virtual memory access • Allocate and manage AFUs for application (via “proxy AFU” (PAFUi) abstraction) • Provide remote procedure call (RFP) abstraction of AFU to application Application User Library Accelerator Abstraction Layer (AAL) PAFU 0 PAFU 1 …. • Enables staged development of AFU algorithms 12 User Space PAFU n Accelerator Interface Adaptor (AIA ) Kernel Space QPI WS0 WS1 …. WSn System Memory QPI QuickAssist FPGA Hardware Module Proxy AFU: • Provide abstraction of AFU to application ( CPU SOFTWARE) QPI Physical (QPH) + Link/Protocol layer ( QLP ) System Protocol Layer (SPL) AFU0 AFU1 …. AFUn AFU Simulation Environment (ASE) • AFU HW-SW co-design environment. • Models platform behavior. • Design and validate the SW against the AFU RTL in the simulator environment. SW APP AFU RTL SPL or CCI Accelerator Interface Adapter (AIA) Driver/Monitor Modelsim1/VCS1 runtime AFU Simulator (Transaction Level Model) • Faster simulation. • Ease of debug. System Memory – Intel and Xeon are registered trademarks of Intel Corporation. Other trademarks are the property of their respective owners. 1 13 Future Work • Research and pathfinding on other accelerator architectures • Programming language/compiler development • Benchmarking and performance characterization 14 Thank You Contact – p.k.gupta@intel.com neal.oliver@intel.com