August 1, 2008 High Performance Reconfigurable Computing Ann Gordon-Ross, Ph.D. NSF CHREC Center Assistant Professor of ECE, University of Florida (on behalf of faculty/staff of CHREC at UF, GWU, BYU, and VT) What is CHREC? NSF Center for High-Performance Reconfigurable Computing Unique US national research center in this field, established Jan’07 Leading research groups in RC/HPC/HPEC @ four major universities University of Florida (lead) George Washington University Brigham Young University Virginia Tech founding sites (2007-) expansion sites (2008-) Under auspices of I/UCRC Program at NSF Industry/University Cooperative Research Center CHREC is supported by CISE & Engineering Directorates @ NSF CHREC is both a National Center and a Research Consortium University groups serve as research base (faculty, students, staff) Industry & government organizations are research partners, sponsors, collaborators, advisory board, & technology-transfer recipients 2 Objectives for CHREC Serve as foremost national research center in this field Basis for long-term partnership and collaboration amongst industry, academe, and government; a research consortium RC: from supercomputers to high-speed embedded systems Directly support research needs of our Center members Enhance educational experience for a large set of highquality graduate and undergraduate students Highly cost-effective manner with pooled, leveraged resources and maximized synergy Ideal recruits after graduation for Center members, many US Advance knowledge and technologies in this field Commercial relevance ensured with rapid technology transfer 3 NSF Model for I/UCRC Centers Research Interaction University Industry I/U Centers Basic Applied/Development 4 CHREC Members 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 5 BLUE = founding member since 2007 ORANGE = new member in 2008 AFRL Munitions Directorate AFRL Space Vehicles Directorate Altera AMD Arctic Region Supercomputing Center Boeing Cadence GE Aviation Systems Gedae Harris Corp. >30 members with Hewlett-Packard >40 memberships in Honeywell IBM Research 2008 Intel L-3 Communications Lockheed Martin MFC Lockheed Martin SSC Los Alamos National Laboratory Luna Innovations NASA Goddard Space Flight Center NASA Langley Research Center NASA Marshall Space Flight Center National Instruments National Reconnaissance Office National Security Agency Network Appliance Office of Naval Research Raytheon Rincon Research Corp. Rockwell Collins Sandia National Laboratory NM * Underline = member supporting multiple CHREC memberships & students in 2008. Lockheed Martin SSC Arctic Region Supercomputing Rockwell Collins Center GE Aviation Systems CHREC (VT) Boeing IBM Research Gedae Intel NASA Goddard AMD NSA Cadence NRO Altera NASA Langley Network Appliance Harris CHREC (BYU) L-3 Communications CHREC (GWU) ONR Luna Innovations Rincon Research Corp. Los Alamos National Lab AFRL Space Vehicles Dir. NASA Marshall Sandia National Lab HewlettPackard Honeywell AFRL Munitions Dir. National Instruments 6 CHREC (UF) Lockheed Martin MFC Raytheon CHREC Faculty University of Florida (lead) George Washington University Dr. Tarek El-Ghazawi, Professor of ECE – GWU Site Director Dr. Ivan Gonzalez, Research Scientist / Visiting Faculty in ECE Dr. Vickram Krishnanarayana, Dr. Proshanta Saha, and Dr. Harald Simmler, Post-doc Research Scientists Brigham Young University Dr. Alan D. George, Professor of ECE – Center Director Dr. Herman Lam, Associate Professor of ECE Dr. K. Clint Slatton, Assistant Professor of ECE and CCE Dr. Greg Stitt, Assistant Professor of ECE Dr. Ann Gordon-Ross, Assistant Professor of ECE Dr. Saumil Merchant, Post-doc Research Scientist Dr. Brent E. Nelson, Professor of ECE – BYU Site Director Dr. Michael J. Wirthlin, Associate Professor of ECE Dr. Michael Rice, Professor of ECE Dr. Brad L. Hutchings, Professor of ECE Virginia Tech Dr. Shawn A. Bohner, Associate Professor of CS – VT Site Director Dr. Peter Athanas, Professor of ECE Dr. Wu-Chun Feng, Associate Professor of CS and ECE Dr. Francis K.H. Quek, Professor of CS 7 CHREC features a strong team of >40 graduate students spanning the four university sites. Membership Fee Structure NSF provides base funds for CHREC via I/UCRC grants Industry and govt. partners support CHREC through memberships Base grant to each participating university site to defray admin costs NOTE: Each membership is associated with ONE university Partners may hold multiple memberships (supporting multiple students) at one or multiple sites (AFRL/MD, GSFC, Honeywell, LANL, MSFC, NRO, NSA, & SNL in ‘08) Membership Fee: $35K per annum Why $35K unit? Base cost of graduate student for one year Fee represents tiny fraction of budget (1-2%) & benefits of Center Stipend, tuition, and related expenses (IDC is waived, otherwise >$50K) CHREC budget projected at ~$3M/yr in 2008 Equivalent to >$10M if Center founded in govt. or industry Each university invests in various costs of CHREC operations 25% matching of industry membership contributions (equip.) Indirect Costs waived on membership fees (~1.5× multiplier) Matching on administrative personnel costs 8 More bang for your buck! Center Membership Benefits Research and collaboration Leveraging and synergy Access to strong cadre of faculty, students, post-docs Recruitment Many benefits between members e.g. new industrial partnerships & teaming opportunities Personnel Highly leveraged and synergistic pool of funding resources Cost-effective R&D in today’s budget-tight environment, ideal for ROI Multi-member collaboration Selection of project topics that membership resources support Direct influence over cutting-edge research of prime interest Review of results on semiannual formal basis & continual informal basis Rapid transfer of results and IP from projects @ ALL sites of CHREC Strong pool of students with experience on industry & govt. R&D issues Facilities Access to university research labs with world-class facilities e.g. RC testbed equipment from Alpha Data, Altera, Celoxica, Cray, DRC, GiDEL, MathStar, Nallatech, SGI,9 SRC, XDI, Xilinx, et al. plus custom CHREC & OpenFPGA Research context and support CHREC OpenFPGA Community Technology innovations Production Utilization 10 Diagram c/o Dr. Eric Stahlberg Education & Outreach CHREC is enabling advancements at all its sites New & updated courses Degree curricula enhancements Student internship connections Visiting scholars Example: new RC courses @ Florida site New undergraduate (EEL4930) & graduate (EEL5934) dual-listed courses in RC began in Fall Term 2007 Lectures, lab experiments, research projects Fundamental topics Special topics from research in CHREC Supported by new RC teaching cluster Sponsored in part by $25K education grant from Rockwell Collins Cluster of 21 RC servers each housing Nallatech PCI-X card with Xilinx Virtex-4 LX100 user FPGA 11 Reformations and RC 12 Architecture Reformation End of wave (Moore’s Law) riding fclk + ILP (CPU) Many promising technologies on new wave Explicit parallelism & multicore the new wave Fixed & reconfigurable multicore device architectures Many R&D challenges lie on new wave Tried & true methods no longer sufficient; complexity abounds Semantic gap widening between applications & systems Inherent traits of fixed device architectures e.g. App developers must now understand & exploit parallelism App-specific: inflexible, expensive (e.g. ASIC) App-generic: power, cooling, & speed challenges (e.g. Opteron) Many niches between extremes (Cell, DSP, GPU, NP, etc.) Reconfigurable architectures promise best of both worlds Speed, flexibility, low-power, adaptability, economy of scale, size Bridging embedded & general-purpose computing, superset of fixed 13 What is a Reconfigurable Computer? System capable of changing hardware structure to address application demands Static or dynamic reconfiguration Reconfigurable computing, configurable computing, custom computing, adaptive computing, etc. Flexibility Often a mix of conventional fixed & reconfigurable devices (e.g. control-flow CPUs, data-flow FPLDs) Enabling technology? FPGA Field-programmable multicore devices FPGA is “King” (but space is broadening) Applications? ECA FPCA FPOA MPPA TILE XPP et al. Vast range – computing and embedded worlds Faster, smaller, less power & heat, adaptable & versatile, selectable precision, high comp. density 14 Reconfigurable Computing (e.g. FPGAs) General-Purpose Processors Special-Purpose Processors (e.g. DSPs, NPs) ASICs Performance Opportunities for RC? 10-100x speedups with 2-10x energy savings not uncommon 15 When and Where to Apply RC? When do we need? Hardware gates targeted to application-specific requirements System mission or applications change over time When the environment is restrictive Power When performance & versatility are critical Performance Limited power, weight, area, volume, etc. Limited communications bandwidth for work offload When autonomy and adaptivity are paramount Where do we need? In conventional servers, clusters, and supercomputers (HPC) Field-programmable hardware fits many demands High DoP, finer grain, direct data-flow mapping, bit manipulation, selectable precision, direct control over H/W (e.g. perf. vs. power) In space, air, sea, undersea, and ground systems (HPEC) Embedded & deployable systems can reap many advantages w/ RC 16 Multicore/Many-Core Taxonomy Riding the new MC wave of Moore’s Law MC Multi-Core Reconfigurable Architecture Homogeneous Devices with segregated RA & FA resources; can use either in standalone mode Fixed Architecture Hybrid Heterogeneous Heterogeneous Homogeneous Heterogeneous Spectrum of Granularity In Each Class Reconfigurability Datapath Device Memory PE/Block Precision Interface 17 Mode Power Interconnect Reconfigurability Factors Datapath Register Register + x Register Register PE/Block Precision 8x8 8x8Multiply MAC 64-bit 24-bit Multiply (Processing Element) (Processing Element) Register 64 KB X 64 32 64 32 Register Device Memory Register x PE Register Interface RC Device Mode PE1 Prg-A Power PE2 Prg-B Prg-A RLDRAM DDR2 Memory Memory Controller PE3 Prg-C Prg-A PE4 Prg-D Prg-A Interconnect PE PE PE PE PE PE PE PE PE PE PE MEM Performance RLDRAM DDR2 SDRAM SDRAM Power 18 MEM PE PE MEM MEM Future Convergence Rising development costs & other factors drive convergence Source: ASIC Design in the Silicon Sandbox: A Complete Guide to Building Mixed-Signal Integrated Circuits. © 2006, the McGraw-Hill Companies. As seen in many other technologies Device architecture convergence? Many-core driven by densities Heterogeneous? Cell as initial example Intel and AMD both cite heterogeneous MC in their future To extent complexity is manageable Homogeneous Fixed architecture Heterogeneous Reconfigurable Multi-/manycores Performance + versatility Adaptive for many apps, missions Avoid limitations of fixed architectures Manage issues of heterogeneity Homogeneous Reconfigurable architecture Heterogeneous Hybrid Architecture 19 Heterogeneous Application Reformation Dawn of reformation in application development methods Driven by architecture reformation; complexity management Holistic concepts, methods, & tools must emerge Semantic gap widening between apps & archs MC world (fixed or RC), explicit parallelism Optimizing compiler ≠ parallelizing compiler Architectures increasingly complex to target by apps New to fixed MC world, familiar to RC/FPGA & HPC worlds Domain scientist involved in comp. structure of their app How do we bridge semantic gap? Focus upon computational fundamentals Formal models, complexity management via abstraction, encapsulation Learn lessons from other engineering fields Elephant in Living Room e.g. aerospace engineers do not flight-test first, why must we? Build basis for an RC engineering discipline Leverage where practical for fixed MC world 20 Reformation in App Development FDTE as formal model Spectrum of application development phases I. Formulation I. Formulation (a) Algorithm design exploration Strategic design playground, abstraction, prediction (b) Architecture design exploration (c) Performance prediction (speed, area, etc.) II. Design II. Design (a) Linguistic design semantics and syntax Tactics, coding, details (b) Graphical design semantics and syntax III. Translation (c) Hardware/software codesign Conversion to executable form (a) Compilation IV. Execution (b) Libraries and linkage Services, debug, optimization (c) Technology mapping (synthesis, place & route) Applies throughout computing III. Translation We focus on RC, which involves hardware & software design 21 IV. Execution (a) Test, debug, and verification (b) Performance analysis and optimization (c) Run-time services Research Activities 22 F1-07: Simulative Performance Prediction Before you invest major $$$ in new systems, software design, & hardware design, better to first predict potential benefits F1 Performance Prediction F2 Performance Analysis F2-07: Performance Analysis & Profiling Without new concepts and powerful tools to locate and resolve performance bottlenecks, max. speedup is extremely elusive F3-07: Application Case Studies RC for HPC or HPEC is relatively new & immature; need to F3 build/share new knowledge with apps & tools from case studies F4-07: Partial Run-Time Reconfiguration Many potential advantages to be gained in performance, adaptability, power, safety, fault tolerance, security, etc. F4 Systems Architecture F5 Device Architecture F5-07: FPLD Device Architectures & Tradeoffs How to understand and quantify performance, power, et al. advantages of FPLDs vs. competing processing technologies 23 Application Case Studies & HLLs Performance, Adaptability, Fault Tolerance, Scalability, Power, Density 2007 CHREC Projects (Florida Site) 2007 CHREC Projects (GWU Site) G1-07: SW/HW Partitioning & Co-Scheduling Algorithms and tools for profiling, partitioning, coscheduling, and targeting of RC Systems G4-07: High-Level Languages Productivity Insight to understand underlying differences among available tools, guide programmer in choosing correct language, impact future HLL development G5-07: Library Portability and Acceleration Cores Framework for portable and reusable library of hardware cores populated with key modules for RC Initial focus upon case studies in computational biology & medical imaging 24 2008 CHREC Projects Network Potential Bottlenecks CPU Interconnect 691MB/s 67% CPU 0 University of Florida Site PHASE 1 9% PHASE 2 16% 0MB/s 0% New insight in multi-device apps, extend RAT prediction models System-level FT, exploiting RTR and PR for dynamic response to rad hazards Extending studies to broader range of RC devices (FPCA, ECA, TILE, etc.) Exploring and defining portable interface framework (PFIF) G6-08: Intelligent Deployment of IP Cores 0MB/s 0% Extending run-time performance analysis to HLL-based RC apps G5-08: Library Portability for HLL Acceleration Cores FPGA 1 CPU 4 Abstraction layer exploring complex alg. & arch. formulations George Washington University Site 210MB/s 10% CPU 3 6MB/s 0% F5-08: Device Characterization & Design Space Exploration FPGA 0 CPU 2 F4-08: Reconfigurable Fault Tolerance & Partial Reconfig. 812MB/s 79% F3-08: Case Studies in Multi-FPGA Application Design 10MB/s 1% F2-08: Application Performance Analysis CPU 1 1.79GB/s 72% IDLE 75% F1-08: System-Level Formulation for Alg/Arch Exp 121MB/s 12% Identify HW tasks, deploy intelligently (grouping, IP interconnect) G7-08: Partial Run-Time Reconfiguration for HPRC Explore PR for HPC apps to reduce RTR delay, HW virtualization 25 2.76GB/s 69% 914MB/s 89% Throughput (MB/s) Time (sec) 904MB/s 88% 1.98GB/s 99% CPU 5 2.50GB/s 100% FPGA 2 2008 CHREC Projects HLL (Matlab/Fortran) HLL (C/C++/SysC) Tools Library Standard Brigham Young University Site B1-08: Core Library Framework for HPC/HPEC Vendor1 OpenFPGA SRAM FPGA B SRAM 32 data 3 control FPGA C 32 data 3 control Processor configuration Device characterizations with RC/Fixed hybrids (FPGA, Cell, GPU) FPGA A SRAM Device-level FT, auto. insertion of SEU mitigation, SEU estimation & detection B4-08: Reliable RC DSP/Comm Systems JHDL Framework for encapsulating details of reusable circuit cores B3-08: High-Reliability RC Design Tools and Techniques … Coregen B2-08: Heterogeneous Architectures for HPEC RC Libraries Application-specific techniques for DSP/communications system design Computation Independent Models Virginia Tech Site V1-08: Model-Based Engineering Framework for HPRC Applications Adapt model-based design methods for RC, feature SDR for case studies V2-08: Process-to-Core Mapping for Advanced Architectures Platform Independent Models Explore process-to-core mappings for hybrid multicore and RC architectures 26 Platform Specific Models Conclusions 29 Conclusions RC making inroads in ever-broadening areas HPC and HPEC; from satellites to supercomputers! As with any new field, early adopters are brave at heart Face challenges with design methods, tools, apps, systems, etc. Fragmented technologies with gaps and proprietary limitations Research & technology challenges abound Formulation Design Application FDTE, device/system arch., FT, RTR, PR, etc. Translation CHREC sites and partners leading key R&D projects Execution Industry/university collaboration is critical to meet challenges Incremental, evolutionary advances will not lead to ultimate success Researchers must take more risks, explore & solve tough problems Industry & government as partners, catalysts, tech-transfer recipients 30 Acknowledgements We express our gratitude for support of CHREC by National Science Foundation CHREC Industry and Government Partners >30 members holding >40 memberships in 2008 University administrations @ CHREC sites Program managers & assistants, center evaluator, panel reviewers University of Florida George Washington University Brigham Young University Virginia Tech Equipment and tools vendors providing support Aldec, Altera, Celoxica, Cray, DRC, Gedae, GiDEL, Impulse, Intel, Mellanox, Nallatech, SGI, SRC, Synplicity, Voltaire, XDI, Xilinx 31