High Performance Reconfigurable Computing - Ann Gordon-Ross

advertisement
August 1, 2008
High Performance Reconfigurable Computing
Ann Gordon-Ross, Ph.D.
NSF CHREC Center
Assistant Professor of ECE, University of Florida
(on behalf of faculty/staff of CHREC at UF, GWU, BYU, and VT)
What is CHREC?

NSF Center for High-Performance Reconfigurable Computing


Unique US national research center in this field, established Jan’07
Leading research groups in RC/HPC/HPEC @ four major universities





University of Florida (lead)
George Washington University
Brigham Young University
Virginia Tech
founding sites (2007-)
expansion sites (2008-)
Under auspices of I/UCRC Program at NSF

Industry/University Cooperative Research Center


CHREC is supported by CISE & Engineering Directorates @ NSF
CHREC is both a National Center and a Research Consortium


University groups serve as research base (faculty, students, staff)
Industry & government organizations are research partners, sponsors,
collaborators, advisory board, & technology-transfer recipients
2
Objectives for CHREC


Serve as foremost national research center in this field

Basis for long-term partnership and collaboration amongst industry,
academe, and government; a research consortium

RC: from supercomputers to high-speed embedded systems
Directly support research needs of our Center members


Enhance educational experience for a large set of highquality graduate and undergraduate students


Highly cost-effective manner with pooled, leveraged resources and
maximized synergy
Ideal recruits after graduation for Center members, many US
Advance knowledge and technologies in this field

Commercial relevance ensured with rapid technology transfer
3
NSF Model for I/UCRC Centers
Research Interaction
University
Industry
I/U Centers
Basic
Applied/Development
4
CHREC Members
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
5
BLUE = founding
member since 2007
ORANGE = new
member in 2008
AFRL Munitions Directorate
AFRL Space Vehicles Directorate
Altera
AMD
Arctic Region Supercomputing Center
Boeing
Cadence
GE Aviation Systems
Gedae
Harris Corp.
>30 members with
Hewlett-Packard
>40 memberships in
Honeywell
IBM Research
2008
Intel
L-3 Communications
Lockheed Martin MFC
Lockheed Martin SSC
Los Alamos National Laboratory
Luna Innovations
NASA Goddard Space Flight Center
NASA Langley Research Center
NASA Marshall Space Flight Center
National Instruments
National Reconnaissance Office
National Security Agency
Network Appliance
Office of Naval Research
Raytheon
Rincon Research Corp.
Rockwell Collins
Sandia National Laboratory NM
* Underline = member supporting multiple
CHREC memberships & students in 2008.
Lockheed Martin
SSC
Arctic Region
Supercomputing
Rockwell Collins
Center
GE Aviation
Systems
CHREC
(VT)
Boeing
IBM
Research
Gedae
Intel
NASA
Goddard
AMD
NSA
Cadence
NRO
Altera
NASA
Langley
Network
Appliance
Harris
CHREC
(BYU)
L-3
Communications
CHREC
(GWU)
ONR
Luna
Innovations
Rincon
Research
Corp.
Los Alamos
National Lab
AFRL Space
Vehicles Dir.
NASA
Marshall
Sandia
National Lab
HewlettPackard
Honeywell
AFRL
Munitions Dir.
National
Instruments
6
CHREC
(UF)
Lockheed Martin
MFC
Raytheon
CHREC Faculty

University of Florida (lead)







George Washington University




Dr. Tarek El-Ghazawi, Professor of ECE – GWU Site Director
Dr. Ivan Gonzalez, Research Scientist / Visiting Faculty in ECE
Dr. Vickram Krishnanarayana, Dr. Proshanta Saha, and Dr. Harald
Simmler, Post-doc Research Scientists
Brigham Young University





Dr. Alan D. George, Professor of ECE – Center Director
Dr. Herman Lam, Associate Professor of ECE
Dr. K. Clint Slatton, Assistant Professor of ECE and CCE
Dr. Greg Stitt, Assistant Professor of ECE
Dr. Ann Gordon-Ross, Assistant Professor of ECE
Dr. Saumil Merchant, Post-doc Research Scientist
Dr. Brent E. Nelson, Professor of ECE – BYU Site Director
Dr. Michael J. Wirthlin, Associate Professor of ECE
Dr. Michael Rice, Professor of ECE
Dr. Brad L. Hutchings, Professor of ECE
Virginia Tech




Dr. Shawn A. Bohner, Associate Professor of CS – VT Site Director
Dr. Peter Athanas, Professor of ECE
Dr. Wu-Chun Feng, Associate Professor of CS and ECE
Dr. Francis K.H. Quek, Professor of CS
7
CHREC features a
strong team of >40
graduate students
spanning the four
university sites.
Membership Fee Structure

NSF provides base funds for CHREC via I/UCRC grants


Industry and govt. partners support CHREC through memberships



Base grant to each participating university site to defray admin costs
NOTE: Each membership is associated with ONE university
Partners may hold multiple memberships (supporting multiple students) at
one or multiple sites (AFRL/MD, GSFC, Honeywell, LANL, MSFC, NRO, NSA, & SNL in ‘08)
Membership Fee: $35K per annum

Why $35K unit? Base cost of graduate student for one year


Fee represents tiny fraction of budget (1-2%) & benefits of Center



Stipend, tuition, and related expenses (IDC is waived, otherwise >$50K)
CHREC budget projected at ~$3M/yr in 2008
Equivalent to >$10M if Center founded in govt. or industry
Each university invests in various costs of CHREC operations



25% matching of industry membership contributions (equip.)
Indirect Costs waived on membership fees (~1.5× multiplier)
Matching on administrative personnel costs
8
More bang
for your buck!
Center Membership Benefits

Research and collaboration





Leveraging and synergy




Access to strong cadre of faculty, students, post-docs
Recruitment


Many benefits between members
e.g. new industrial partnerships & teaming opportunities
Personnel


Highly leveraged and synergistic pool of funding resources
Cost-effective R&D in today’s budget-tight environment, ideal for ROI
Multi-member collaboration


Selection of project topics that membership resources support
Direct influence over cutting-edge research of prime interest
Review of results on semiannual formal basis & continual informal basis
Rapid transfer of results and IP from projects @ ALL sites of CHREC
Strong pool of students with experience on industry & govt. R&D issues
Facilities

Access to university research labs with world-class facilities
e.g. RC testbed equipment from Alpha Data, Altera, Celoxica, Cray, DRC,
GiDEL, MathStar, Nallatech, SGI,9 SRC, XDI, Xilinx, et al. plus custom
CHREC & OpenFPGA
Research context and support
CHREC
OpenFPGA
Community
Technology innovations
Production
Utilization
10
Diagram c/o
Dr. Eric Stahlberg
Education & Outreach

CHREC is enabling advancements at all its sites





New & updated courses
Degree curricula enhancements
Student internship connections
Visiting scholars
Example: new RC courses @ Florida site

New undergraduate (EEL4930) & graduate (EEL5934)
dual-listed courses in RC began in Fall Term 2007

Lectures, lab experiments, research projects



Fundamental topics
Special topics from research in CHREC
Supported by new RC teaching cluster


Sponsored in part by $25K education grant from Rockwell Collins
Cluster of 21 RC servers each housing Nallatech PCI-X card with
Xilinx Virtex-4 LX100 user FPGA
11
Reformations
and RC
12
Architecture Reformation

End of wave (Moore’s Law) riding fclk + ILP (CPU)


Many promising technologies on new wave


Explicit parallelism & multicore the new wave
Fixed & reconfigurable multicore device architectures
Many R&D challenges lie on new wave


Tried & true methods no longer sufficient; complexity abounds
Semantic gap widening between applications & systems


Inherent traits of fixed device architectures




e.g. App developers must now understand & exploit parallelism
App-specific: inflexible, expensive (e.g. ASIC)
App-generic: power, cooling, & speed challenges (e.g. Opteron)
Many niches between extremes (Cell, DSP, GPU, NP, etc.)
Reconfigurable architectures promise best of both worlds


Speed, flexibility, low-power, adaptability, economy of scale, size
Bridging embedded & general-purpose computing, superset of fixed
13
What is a Reconfigurable Computer?
System capable of changing hardware structure to
address application demands

Static or dynamic reconfiguration

Reconfigurable computing, configurable computing,
custom computing, adaptive computing, etc.



Flexibility

Often a mix of conventional fixed & reconfigurable
devices (e.g. control-flow CPUs, data-flow FPLDs)
Enabling technology?
FPGA

Field-programmable multicore devices

FPGA is “King” (but space is broadening)
Applications?


ECA
FPCA
FPOA
MPPA
TILE
XPP
et al.
Vast range – computing and embedded worlds
Faster, smaller, less power & heat, adaptable &
versatile, selectable precision, high comp. density
14
Reconfigurable
Computing
(e.g. FPGAs)
General-Purpose
Processors
Special-Purpose
Processors
(e.g. DSPs, NPs)
ASICs
Performance
Opportunities for RC?
10-100x speedups with
2-10x energy savings
not uncommon
15
When and Where to Apply RC?

When do we need?



Hardware gates targeted to application-specific requirements
System mission or applications change over time
When the environment is restrictive



Power
When performance & versatility are critical


Performance
Limited power, weight, area, volume, etc.
Limited communications bandwidth for work offload
When autonomy and adaptivity are paramount
Where do we need?

In conventional servers, clusters, and supercomputers (HPC)



Field-programmable hardware fits many demands
High DoP, finer grain, direct data-flow mapping, bit manipulation,
selectable precision, direct control over H/W (e.g. perf. vs. power)
In space, air, sea, undersea, and ground systems (HPEC)

Embedded & deployable systems can reap many advantages w/ RC
16
Multicore/Many-Core Taxonomy
Riding the new MC
wave of Moore’s Law
MC
Multi-Core
Reconfigurable
Architecture
Homogeneous
Devices with
segregated RA &
FA resources; can
use either in standalone mode
Fixed
Architecture
Hybrid
Heterogeneous
Heterogeneous
Homogeneous
Heterogeneous
Spectrum of Granularity In Each Class
Reconfigurability
Datapath
Device Memory
PE/Block
Precision
Interface
17
Mode
Power
Interconnect
Reconfigurability Factors
Datapath
Register
Register
+
x
Register
Register
PE/Block
Precision
8x8
8x8Multiply
MAC
64-bit
24-bit Multiply
(Processing Element)
(Processing Element)
Register
64 KB X 64
32
64
32
Register
Device Memory
Register
x
PE
Register
Interface
RC Device
Mode
PE1
Prg-A
Power
PE2
Prg-B
Prg-A
RLDRAM
DDR2 Memory
Memory
Controller
PE3
Prg-C
Prg-A
PE4
Prg-D
Prg-A
Interconnect
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
MEM
Performance
RLDRAM
DDR2 SDRAM
SDRAM
Power
18
MEM
PE
PE
MEM
MEM
Future Convergence

Rising development costs & other
factors drive convergence


Source: ASIC Design in the Silicon Sandbox: A Complete Guide to Building
Mixed-Signal Integrated Circuits. © 2006, the McGraw-Hill Companies.
As seen in many other technologies
Device architecture convergence?



Many-core driven by densities
Heterogeneous?

Cell as initial example

Intel and AMD both cite heterogeneous
MC in their future

To extent complexity is manageable
Homogeneous
Fixed
architecture
Heterogeneous
Reconfigurable
Multi-/manycores

Performance + versatility

Adaptive for many apps, missions

Avoid limitations of fixed architectures

Manage issues of heterogeneity
Homogeneous
Reconfigurable
architecture
Heterogeneous
Hybrid
Architecture
19
Heterogeneous
Application Reformation

Dawn of reformation in application development methods



Driven by architecture reformation; complexity management
Holistic concepts, methods, & tools must emerge
Semantic gap widening between apps & archs

MC world (fixed or RC), explicit parallelism



Optimizing compiler ≠ parallelizing compiler


Architectures increasingly complex to target by apps
New to fixed MC world, familiar to RC/FPGA & HPC worlds
Domain scientist involved in comp. structure of their app
How do we bridge semantic gap?

Focus upon computational fundamentals


Formal models, complexity management via abstraction, encapsulation
Learn lessons from other engineering fields


Elephant in Living Room
e.g. aerospace engineers do not flight-test first, why must we?
Build basis for an RC engineering discipline

Leverage where practical for fixed MC world
20
Reformation in App Development

FDTE as formal model
Spectrum of application development phases
I. Formulation
I. Formulation

(a) Algorithm design exploration
Strategic design playground,
abstraction, prediction
(b) Architecture design exploration
(c) Performance prediction (speed, area, etc.)
II. Design

II. Design
(a) Linguistic design semantics and syntax
Tactics, coding, details
(b) Graphical design semantics and syntax
III. Translation

(c) Hardware/software codesign
Conversion to executable form
(a) Compilation
IV. Execution


(b) Libraries and linkage
Services, debug, optimization
(c) Technology mapping (synthesis, place & route)
Applies throughout computing

III. Translation
We focus on RC, which involves
hardware & software design
21
IV. Execution
(a) Test, debug, and verification
(b) Performance analysis and optimization
(c) Run-time services
Research
Activities
22
F1-07: Simulative Performance Prediction

Before you invest major $$$ in new systems, software design,
& hardware design, better to first predict potential benefits
F1
Performance
Prediction
F2
Performance
Analysis
F2-07: Performance Analysis & Profiling

Without new concepts and powerful tools to locate and resolve
performance bottlenecks, max. speedup is extremely elusive
F3-07: Application Case Studies

RC for HPC or HPEC is relatively new & immature; need to
F3
build/share new knowledge with apps & tools from case studies
F4-07: Partial Run-Time Reconfiguration

Many potential advantages to be gained in performance,
adaptability, power, safety, fault tolerance, security, etc.
F4
Systems
Architecture
F5
Device
Architecture
F5-07: FPLD Device Architectures & Tradeoffs

How to understand and quantify performance, power, et al.
advantages of FPLDs vs. competing processing technologies
23
Application Case
Studies & HLLs
Performance, Adaptability, Fault Tolerance, Scalability, Power, Density
2007 CHREC Projects (Florida Site)
2007 CHREC Projects (GWU Site)
G1-07: SW/HW Partitioning & Co-Scheduling

Algorithms and tools for profiling, partitioning, coscheduling, and targeting of RC Systems
G4-07: High-Level Languages Productivity

Insight to understand underlying differences among
available tools, guide programmer in choosing correct
language, impact future HLL development
G5-07: Library Portability and Acceleration Cores

Framework for portable and reusable library of
hardware cores populated with key modules for RC

Initial focus upon case studies in computational
biology & medical imaging
24
2008 CHREC Projects
Network
Potential Bottlenecks
CPU Interconnect
691MB/s
67%
CPU 0
University of Florida Site


PHASE 1
9%
PHASE 2
16%
0MB/s
0%
New insight in multi-device apps, extend RAT prediction models
System-level FT, exploiting RTR and PR for dynamic response to rad hazards
Extending studies to broader range of RC devices (FPCA, ECA, TILE, etc.)
Exploring and defining portable interface framework (PFIF)
G6-08: Intelligent Deployment of IP Cores


0MB/s
0%
Extending run-time performance analysis to HLL-based RC apps
G5-08: Library Portability for HLL Acceleration Cores


FPGA 1
CPU 4
Abstraction layer exploring complex alg. & arch. formulations
George Washington University Site

210MB/s
10%
CPU 3
6MB/s
0%
F5-08: Device Characterization & Design Space Exploration


FPGA 0
CPU 2
F4-08: Reconfigurable Fault Tolerance & Partial Reconfig.


812MB/s
79%
F3-08: Case Studies in Multi-FPGA Application Design


10MB/s
1%
F2-08: Application Performance Analysis


CPU 1
1.79GB/s
72%
IDLE
75%
F1-08: System-Level Formulation for Alg/Arch Exp

121MB/s
12%
Identify HW tasks, deploy intelligently (grouping, IP interconnect)
G7-08: Partial Run-Time Reconfiguration for HPRC

Explore PR for HPC apps to reduce RTR delay, HW virtualization
25
2.76GB/s
69%
914MB/s
89%
Throughput
(MB/s)
Time (sec)

904MB/s
88%
1.98GB/s
99%
CPU 5
2.50GB/s
100%
FPGA 2
2008 CHREC Projects
HLL (Matlab/Fortran)
HLL (C/C++/SysC)
Tools
Library Standard

Brigham Young University Site

B1-08: Core Library Framework for HPC/HPEC


Vendor1
OpenFPGA
SRAM
FPGA
B
SRAM
32 data
3 control
FPGA
C
32 data
3 control
Processor configuration
Device characterizations with RC/Fixed hybrids (FPGA, Cell, GPU)
FPGA
A
SRAM
Device-level FT, auto. insertion of SEU mitigation, SEU estimation & detection
B4-08: Reliable RC DSP/Comm Systems


JHDL
Framework for encapsulating details of reusable circuit cores
B3-08: High-Reliability RC Design Tools and Techniques


…
Coregen
B2-08: Heterogeneous Architectures for HPEC RC


Libraries
Application-specific techniques for DSP/communications system design
Computation
Independent
Models
Virginia Tech Site

V1-08: Model-Based Engineering Framework for HPRC Applications


Adapt model-based design methods for RC, feature SDR for case studies
V2-08: Process-to-Core Mapping for Advanced Architectures

Platform
Independent
Models
Explore process-to-core mappings for hybrid multicore and RC architectures
26
Platform
Specific
Models
Conclusions
29
Conclusions

RC making inroads in ever-broadening areas




HPC and HPEC; from satellites to supercomputers!
As with any new field, early adopters are brave at heart

Face challenges with design methods, tools, apps, systems, etc.

Fragmented technologies with gaps and proprietary limitations
Research & technology challenges abound
Formulation
Design

Application FDTE, device/system arch., FT, RTR, PR, etc.
Translation

CHREC sites and partners leading key R&D projects
Execution
Industry/university collaboration is critical to meet challenges

Incremental, evolutionary advances will not lead to ultimate success

Researchers must take more risks, explore & solve tough problems

Industry & government as partners, catalysts, tech-transfer recipients
30
Acknowledgements
We express our gratitude for support of CHREC by
 National Science Foundation


CHREC Industry and Government Partners


>30 members holding >40 memberships in 2008
University administrations @ CHREC sites





Program managers & assistants, center evaluator, panel reviewers
University of Florida
George Washington University
Brigham Young University
Virginia Tech
Equipment and tools vendors providing support

Aldec, Altera, Celoxica, Cray, DRC, Gedae, GiDEL, Impulse, Intel,
Mellanox, Nallatech, SGI, SRC, Synplicity, Voltaire, XDI, Xilinx
31
Download