12 Jan 2012 2012 NSF I/UCRC Annual Meeting CHREC and Novo-G: An Innovative and Synergistic Research Project and The World’s Most Powerful Reconfigurable Supercomputer Alan D. George, Ph.D. Director, NSF CHREC Center Professor of ECE, University of Florida Herman Lam, Ph.D. Assoc. Professor of ECE, University of Florida Research highlighted in this presentation was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC-0642422. Outline CHREC Novo-G Center overview CHREC sites, faculty, & students CHREC members & memberships Industry impact & technology transfer Reconfigurable computing Overview Machine architecture Application acceleration International Novo-G forum Conclusions and Looking Ahead 2 I/UCRC grant originated in Sep. 2006 • 1 university site, 9 membership commitments Strong growth in first 5 years (Phase-I) • Grown to 4 university sites (UF, GW, BYU, VT) • Grown to 29 members (aerospace, IT, etc.) Center Mission and Theme • Grown to 42 memberships (all full, $35K/ea) • R&D to advance S&T in nexus of reconfigurable, high-performance, and/or high-performance Strong scholarship record embedded computing (i.e., RC, HPC, HPEC) • >115 refereed journal & conference papers • Computing performance, power, adaptivity, • Several NSF CAREER awards scalability, productivity, cost, size, weight, etc. • Best-paper awards, keynotes, etc. • From space satellites to supercomputers! Strong graduation record • Dozens of Ph.D. & M.S. graduates to date • Many hired by CHREC members • Dozens more served with members as interns World-class facilities developed in-house • Novo-G: world’s top reconfigurable computer • HokieSpeed: GPU-centric supercomputer • Pyramid: CPU-centric supercomputer CHREC Faculty University of Florida (lead) Brigham Young University Dr. Brent E. Nelson, Professor of ECE – BYU Site Director Dr. Michael J. Wirthlin, Professor of ECE Dr. Brad L. Hutchings, Professor of ECE Dr. Michael Rice, Professor of ECE George Washington University Dr. Alan D. George, Professor of ECE – Center Director Dr. Herman Lam, Associate Professor of ECE Dr. Ann Gordon-Ross, Assistant Professor of ECE Dr. Greg Stitt, Assistant Professor of ECE Dr. Jose Principe, Distinguished Professor of ECE and BME Dr. Andy Li, Associate Professor of ECE Dr. Vikas Aggarwal, Research Scientist in ECE Dr. Tarek El-Ghazawi, Professor of ECE – GWU Site Director Dr. Vikram Narayana, Assistant Research Professor in ECE Virginia Tech Dr. Peter Athanas, Professor of ECE – VT Site Director Dr. Wu-Chun Feng, Associate Professor of CS and ECE Dr. Patrick Shaumont, Assistant Professor of ECE Dr. Heshan Lin, Senior Research Associate in CS 4 Most importantly, CHREC features an exceptional team of >40 graduate students spanning our 4 university sites. 1. CHREC Members 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 42 memberships ($35K/ea) from 29 members in 2011 26. 27. 28. 29. 5 AFRL Munitions Directorate (4) AFRL Sensors Directorate AFRL Space Vehicles Directorate (2) Altera AMD Arctic Region Supercomputing Center (2) Army RD&E Command Boeing Research & Technology GiDEL Harris Honeywell (2) Intel Lockheed Martin MFC Lockheed Martin SSC Lockheed Martin SVIL Los Alamos National Laboratory (2) Mentor Graphics Monsanto NASA Goddard Space Flight Center NASA Marshall Space Flight Center National Instruments (2) National Security Agency (4) Northrop-Grumman Aerospace Systems Oak Ridge National Laboratory (2) Office of Naval Research Sandia National Laboratories SEAKR Engineering Veritomyx Xilinx (2) Industry Impact & Tech Transfer 12 projects spanning broad areas of RC, HPC, HPEC Performance – optimizing speed, power, scalability, adaptability Productivity – reducing design complexity for developers and users Space-based processing, reliable architectures, partial reconfiguration Industry impact CHREC drives & influences many industry programs Design concepts, tools, modeling, middleware, compilation, integration Aerospace – addressing unique needs in this key community Parallel algorithms, applications, architectures (FPGA, GPU, Manycore) Annual surveys routinely cite millions of $ per year in industry impact Many very close relationships between sites & members Technology transfer to date Dozens of industry personnel hires, dozens of internships >115 new papers and >30 new tools crafted with/for members 6 What is Reconfigurable Computing? General characteristics: Architecture adapts to match unique needs of each app Relatively new and revolutionary paradigm of computing Limited but growing list of available devices, tools, systems, and apps Technical advantages: GREAT performance when app not well suited to fixed processor e.g., FPGA; “Custom Fit” usage strategy; reconfigurable by task or app Why? Customized hardware parallelism (width, depth), data precision (size, format), operations and units (type, quantity), memory structure, etc. LOWER energy consumption than fixed processors (CPU, GPU) Technical disadvantages: Relatively new and immature paradigm of computing Programming complexity with adaptive hardware Causes: inherent with novelty of approach; “newness” of field and tools 7 What is Novo-G? Motivation Growing computational demands in many science and engineering domains becoming principal bottleneck Scalable RC systems (e.g., Novo-G) uniquely capable of both high performance and low energy, cooling, TCO Goals: Investigate, develop, evaluate, & showcase: Emphases Most powerful RC machine ever fielded for research Innovative suite of productivity tools for app development Impactful set of scalable kernels/apps in key science areas Performance (system), Productivity (concepts/tools), Impact (apps) Theme Novo-G is an RC-centric machine (not merely CPUs with accelerators!) Features FPGA/RAM coupling (4.25 or 8.5 GB in 3 banks coupled to each FPGA) Features FPGA/FPGA coupling (up to 8 coupled; e.g., systolic array, virtual FPGA) CPUs and GPUs serve in supporting role (e.g., I/O, preprocessing, postprocessing) 8 Novo-G Machine * Our cluster vendor is Ace Computers Novo-G Annual Growth 2009: 96 top-end Stratix-III FPGAs, each with 4.25GB SDRAM 2010: 96 more Stratix-III FPGAs, each with 4.25GB SDRAM 2011: 96 top-end Stratix-IV FPGAs, each with 8.5GB SDRAM 2012: 96 more Stratix-IV FPGAs, each with 8.5GB SDRAM 1 head-node server (1U) with: • 2 Xeon E5520 2.26 GHz quad-core CPUs • 24GB ECC DDR3, 3 x 1TB SATA2 24 compute servers (4U), each with: 192 • Xeon E5520 quad-core CPU FPGAs • 6GB ECC DDR3, 250GB SATA2 • 2 GiDEL ProcStar-III PCIe x8 cards, each with 4 Stratix-III E260 FPGAs and 4x4.25 = 17GB RAM 6 compute servers (4U), each with: • 2 Xeon E5620 2.4GHz quad-core CPUs • 16GB ECC DDR3, 2TB SATA2 • 4 GiDEL ProcStar-IV PCIe x8 cards, each with 4 Stratix-IV E530 FPGAs and 4x8.5 = 32GB RAM 96 FPGAs • GTX-480 GPU 9 Impactful Novo-G App Research: BioRC examples Needleman-Wunsch (NW) Optimal alg. for local alignment of DNA and RNA sequences Needleman-Wunsch (NW) Smith-Waterman w/ trace-back Smith-Waterman (SW) with trace-back Optimal alg. for global alignment of DNA and RNA sequences Novel systolic array architecture Complex-controller performance with simple-controller overhead Extendable across FPGAs using neighbor bus Computation of trace-back for SW overlapped with hardware processing of next sequence Baseline: Database length 226 Bases v 512, length 500 Seqs Software Runtime: 7,126 CPU hours on 2.4 GHz Opteron Baseline: 192∙225, length 850 Sequence Comparisons Software Runtime: 11,026 CPU∙hours on 2.4GHz Opteron # FPGAs Runtime (sec) Speedup # FPGAs Runtime (sec) Speedup 1 25,927 989 1 47,616 833 4 6,482 3,958 4 12,014 3,304 96 271 94,639 96 503 78,914 128 206 124,710 128 391 101,518 192 (est.) 137 187,492 192 (est.) 270 147,013 Each 3D chart (for Smith-Waterman and Needleman-Wunsch) illustrates performance of a single FPGA under varying input conditions. Each table shows scaling performance with varying number of FPGAs under optimal input conditions. Jaguar supercomputer @ ORNL: 224,256 cores (2.4 GHz Hexacore Opterons) @ 6.95 MWs K Computer in Japan (largest supercomputer in world): 548,352 cores ; “uses enough electricity to power almost 10,000 homes at a cost of about $10 million per year” (New York Times - 06/19/11) By contrast, with 192+192 FPGAs (Summer 2012), for key BioRC 10 apps, Novo-G speedup approaching 500K cores @ <16KW Broad Range of Novo-G App Research Broad range of Novo-G research BioRC Technology Transfer CHREC BLAST Toolset (Monsanto) Smith-Waterman (w/ or w/o traceback), Needleman-Wunsch, Needle-Distance, Isoformic proteomics, BLASTp (collaboration with Boston University), CHREC BLAST Toolset (Novo-BLAST and BSW: BLAST-wrapped SW) FinRC: e.g., Barrier options using Heston model DSP: e.g., Information-Theoretic approach to image segmentation Domain exploration in other Isotopic Pattern Calculator (Veritomyx) science and engineering fields Very promising results (speed, energy) 50x to 5000x speedup per FPGA vs. fast CPU core 11 Computation demand in bioinformatics becoming prohibitive bottleneck Novo-BLAST: accelerates BLAST’s word matching algorithm up to 19x on single Stratix III BLAST-wrapped SW: Smith-Waterman core (previous slide) with BLAST wrapper; SSEARCH-like accuracy with BLAST-like performance Code transfer & field test in 1st qtr. 2012 Dominating bottleneck in proteomics app for cancer research Measured up to 470x speedup for single Stratix IV FPGA Code transfer & field test in 1st qtr. 2012 International Novo-G Forum Founded in January 2010 International community research forum to explore performance, productivity, and sustainability of RC at scale Consists of 11 academic teams using common platform Each team working on its own research apps and/or tools Each team has one or more local Novo-G quad-FPGA boards Remote access to big Novo-G @ Florida for large-scale runs Boston University Clemson University Federal University of Pernambuco (Brazil) University of Florida George Washington University 12 University of Glasgow (UK) Imperial College (UK) Northeastern University University of South Carolina University of Tennessee Washington University in St. Louis Conclusions and Looking Ahead RC: revolutionary paradigm of computing Architecture adapts to match unique needs of each app CHREC Novo-G reconfigurable supercomputer Most powerful RC machine ever fielded for research World-class speedups for key apps in science and engineering Rivaling the world’s largest conventional supercomputers But at a tiny fraction of their size, power, cost, and weight Synergistic activity Leverages private, state, and federal funding resources Close partnership with CHREC member organizations: Altera, GiDEL, Monsanto, Veritomyx, et al. Novo-G Forum: international team of 11 universities Novo-G future: science and engineering domain exploration New RC-amenable apps in BioRC, DSP, and FinRC Explore new promising domains e.g., computational chemistry, cryptanalysis 13 CORBI anyone?