A Reconfigurable Computing System Based on a Cache-Coherent Fabric Presenter: Neal Oliver

advertisement
A Reconfigurable Computing System
Based on a Cache-Coherent Fabric
Presenter: Neal Oliver
Intel Corporation
June 10, 2012
Authors- Neal Oliver, Rahul R Sharma, Stephen Chang, Bhushan Chitlur, Elkin
Garcia, Joseph Grecco, Aaron Grier, Nelson Ijih, Yaping Liu, Pratik Marolia, Henry
Mitchel, Suchit Subhaschandra, Arthur Sheiman, Tim Whisonant, and Prabhat
Gupta
Presented at CARL 2012
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS
DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL
ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING
TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A
PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER
INTELLECTUAL PROPERTY RIGHT.
UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED
FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE
PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not
rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel
reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities
arising from future changes to them. The information here is subject to change without notice. Do not finalize a
design with this information.
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your
Intel representative to obtain Intel’s current plan of record product roadmaps.
The products described in this document may contain design defects or errors known as errata which may cause the
product to deviate from published specifications. Current characterized errata are available on request.
All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are
subject to change without notice.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your
product order.
Copyright © 2012, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others
2
Topics
• Motivations for this work
• Usage Models
• Overview of Intel QuickPath Interconnect (QPI)
• Platform Architecture
• Hardware Architecture
• Programming Model
• Simulation
• Future Work
3
Motivation for this work
• Keep innovation on Intel platforms
• High-throughput, low-latency attachment to server
• Cache-coherent memory
• Enlarge memory space for FPGA platform
• Enable additional programming paradigms
• Drive requirements for future fabrics
• Platform for simulation/emulation
4
Usage Models
Accelerators
•Algorithmic accelerators
•e.g. Seismic imaging,
genomics, computational
finance
CPU QPI Accel
FPGA
HW/SW Co-Emulation
Platforms
• Enables development of
high performance
emulation platforms
• Pre-Si SW development
5
QPI
ASIC
Logic
Intel QuickAssist
Accelerator
Platform (QAP)
ASIC Prototyping
•Significantly reduces risk
for ASICs connecting to QPI
•e.g. QPI attached ASICs for
telecom, node controllers
QPI
RTL
ASIC
CPU QPI Bridge
FPGA
FPGA Farm
OR
Big Box Emulator
Intel QuickPath Interconnect (QPI)
QPI: A low latency, point-to-point coherent system interconnect.
QPI Caching Agent (CA) – cache devices
QPI Home Agent (HA) – DDR memory controller
Latest Intel platforms have IO integrated into CPU (not shown)
Four Socket Platform
M
6
M
M
QPI
Links
QPI
Links
M
P1
P0
P2
Two Socket Platform
P3
CS
CS
I/O I/O
I/O I/O
M
P0
QPI
Links
QPI
Links
P1
CS
M
Memory
I/O I/O
P
Processor
CS
Chipset
QPI-attached Accelerator Hardware Module
(AHM)
Substitution of AHM for
CPU
M
P
AHM installed in Server
AHM
M
P
M
Accelerator
Hardware
Module (AHM)
QPI
Links
M
P
CS
QPI
Links
I/O I/O
FPGA
Xilinx Virtex 6
Module
CS
I/O I/O
PCIe attached FPGA
Altera Stratix IV
Module
Intel® Xeon processor 7000 series
7
Platform Architecture
CPU
AAL
Interface
Direct
Driver
Interface
SW elements
Host Application
FPGA
Accelerator
Abstraction Layer
System
Protocol Layer
Link and Protocol
Link and Protocol
PHY
PHY
Intel QPI
8
Accelerator
Function Unit(s)
(AFUs)
SPL
Interface
CCI Core
Cache
Interface
QPI HW stack
FPGA
Accelerator
Function Units
(AFU)
System Protocol
Layer
(virtual memory
subsystem)
9
•Application-specific hardware module, user algorithm to
be accelerated.
•Can connect to SPL or CCI interface
•Implements virtual to physical address translation
•Handles all cache line reordering in blocks of data
•DMA engine
•Memory management and protection
QPI Link and
Protocol (QLP)
•QPI Caching (cache) and Home Agent (memory controller)
implemented
•64KB to 256KB set-associative cache
•Manages coherency of FPGA attached DDR memory
QPI PHY (QPH)
•FPGA high speed IO operating at 4.8 to 6.4 GT/sec
•Full width : 20 data lanes per direction
QPI Home+Cache Agent Architecture
• Modular: CA, HA,
CA+HA
• Interfaces:
CCI- hides complexity
of underlying coherent
fabric.
MC (optional) –
provides interface to
memory controller.
• QPH electrical uses
existing FPGA IOs
• QLP designed using
soft logic, tightly
integrated, highly
parallel.
• Programmable
cache/snoop tables
To
DIMM
To AFU
DDR
SPL
Memory Controller
CCI
Cache
Data
RAM
MC interface
Memory
Queue
Core Cache
Module
Rx
Control
Snoop tag
Cache tag
Tx
Control
QPI Link + Protocol
Control
8 flits (640 bits)
QPI PHY
Rx
Align
(QPH)
phits
QPI link to CPU
10
QLP
Snoop
table
Cache
table
8 flits (640 bits)
Tx
Align
Cache Hit/Miss protocol flow
11
•
Cache hits
are an order
of magnitude
faster.
•
Prefetch
striding
patterns to
improve
cache hit rate
Programming Model
AAL functions:
• Allocate shared workspaces
WSi and pin in system
memory
• Support both physical and
virtual memory access
• Allocate and manage AFUs
for application (via “proxy
AFU” (PAFUi) abstraction)
• Provide remote procedure
call (RFP) abstraction of
AFU to application
Application
User Library
Accelerator Abstraction Layer (AAL)
PAFU 0
PAFU 1
….
• Enables staged
development of AFU
algorithms
12
User
Space
PAFU n
Accelerator Interface Adaptor (AIA )
Kernel
Space
QPI
WS0
WS1
….
WSn
System
Memory
QPI
QuickAssist FPGA
Hardware Module
Proxy AFU:
• Provide abstraction of AFU
to application
( CPU
SOFTWARE)
QPI Physical (QPH) + Link/Protocol layer
( QLP )
System Protocol Layer (SPL)
AFU0
AFU1
….
AFUn
AFU Simulation Environment (ASE)
• AFU HW-SW co-design
environment.
• Models platform
behavior.
• Design and validate the
SW against the AFU RTL
in the simulator
environment.
SW APP
AFU RTL
SPL or
CCI
Accelerator Interface
Adapter (AIA)
Driver/Monitor
Modelsim1/VCS1 runtime
AFU Simulator
(Transaction Level Model)
• Faster simulation.
• Ease of debug.
System
Memory
– Intel and Xeon are registered trademarks of Intel Corporation. Other trademarks are the property of
their respective owners.
1
13
Future Work
• Research and pathfinding on other accelerator architectures
• Programming language/compiler development
• Benchmarking and performance characterization
14
Thank You
Contact – p.k.gupta@intel.com
neal.oliver@intel.com
Download